Best VIP Call Girls Noida Sector 23 Call Me: 8700611579
Introduction to IT Tools for Translation
1. IT AND TRANSLATION
INTRODUCTION
Dr. Mohammed H. Al Aqad
Senior Lecturer – MSU
alakkadmohmad@yahoo.com
Dr. Mohammed H. Al Aqad
Senior Lecturer – MSU
alakkadmohmad@yahoo.com
Intro to CAT Tools
3. Rationale for IT Applications to Translation
“A computer is a device that can be used to magnify human
productivity. Properly used, it does not dehumanize by imposing
its own Orwellian stamp on the products of human spirit ……….
………..Translation is a fine and exacting art, but there is much
about it that is mechanical and routine, if this were given over to a
machine, the productivity of the translator would not only be
magnified but this work would become more rewarding, more
exciting, more human.”
Martin Kay (1987)
4. COURSE OVERVIEW
ESSENTIALS
TEXT PROCESSING
MT
TM
WORKING WITH CORPORA
TERMINOLOGY EXTRACTION AND
GLOSSARY PRODUCTION (MONOLINGUAL
AND BILINGUAL CORPORA)
5. COURSE OVERVIEW - DETAILS
1) ESSENTIALS:
Types of computer aides
CAT vs. MT
History of CAT tools
General principles of working with CAT tools
Reference materials
Localization and internationalization
UNIX
6. COURSE OVERVIEW - DETAILS
2) TEXT PROCESSING
Word and WordPad (tips and tricks)
Fonts, code pages, keyboard layout,
language tools in Windows XP and Office
Speech recognition software
Scanning
OCR
File types (essential info on the most
common file types and file conversion
utilities)
7. COURSE OVERVIEW - DETAILS
3) MT
How it works, brief exhibition:
Systran Pro
Prompt
Neuro Tran
Babelfish
DESKTOP BASED
SUPPORTS CROATIAN (partially
Serbian)
WEB BASED
8. COURSE OVERVIEW - DETAILS
4) TM:
Overview (what it is, standards and file
formats)
Desktop vs. server based TM programs
WinAlign
WordFast
Trados (nowadays SDL Trados) –
Freelance edition
Sisulizer
10. COURSE OVERVIEW - DETAILS
6) TERMINOLOGY EXTRACTION AND
GLOSSARY PRODUCTION
Essentials
Doing it automatically: Trados (i.e. SDL)
MultiTerm (Desktop and Extract)
Doing it semi-automatically: ParaConc,
Concordancer
11. COURSE REQUIREMENTS
Basic computer literacy
Positive outlook:
Computers don’t bite
CAT tools are not complex, they are actually
made to make you more efficient
Interest in translation
Willingness to become several times
more efficient in doing translations
12. SCHEDULE
HONESTLY, WE DON’T KNOW FOR
CERTAIN!
THAT’S WHY WE NEED YOUR EMAIL
ADDRESSES, SO THAT WE CAN
KEEP YOU UPDATED WITH THE
LATEST SCHEDULE DEVELOPMENTS
PROBABLY: LOCATION: 25 (lectures)
and 38 (computer lab), SATURDAYS,
at16:00 O’CLOCK
13. LITERATURE
Geoffrey Samuelsson-Brown, A Practical Guide for Translators
(Topics in Translation), Multilingual Matters, 4th edition (May 28,
2004)
H. L. Somers (Editor), Computers and Translation: A Translator's
Guide (Benjamins Translation Library, 35), John Benjamins
Publishing Co, 1st edition (May 2003)
Bert Esselink, A Practical Guide to Localization (Language
International World Directory), John Benjamins Publishing Co,
Revised 1st edition (September 2000)
Silvia Pavel and Diane Nolet, Handbook of Terminology,
Translation Bureau of Canada, 1st edition (2001)
Frank Austermuhl, Electronic Tools for Translators (Translation
Practices Explained), St. Jerome, 1st edition, (April 2001)
14. COURSE OVERVIEW - GRADING
This is a hands-on course
You will be graded on the basis of the
results of your practical assignments:
Creating TMs from parallel texts (fiction and
non-fiction e.g. a book and a manual) – in a
way, you will be also creating a parallel
corpus
Translating two short passages (fiction and
non-fiction) using your newly created TMs
16. TYPES OF COMPUTER AIDES
Computer aides / tools that are relevant to
translators can be roughly classified into three
groups:
Basic input and editing tools
Reference tools
Productivity tools
17. CAT vs MT
As soon as you start using computer software
in the process of translating, you are entering
the realm of COMPUTER-AIDED
TRANSLATION, or CAT in short.
In other words, CAT is a form of translation
wherein a human translator translates texts
using computer software designed to support
and facilitate the translation process.
18. CAT vs MT (continued)
The problem is that COMPUTER-AIDED
TRANSLATION, is sometimes also called
COMPUTER-ASSISTED TRANSLATION,
MACHINE-AIDED TRANSLATION or
MACHINE-ASSISTED TRANSLATION.
Due to the latter two terms, CAT is sometimes
confused with MACHINE TRANSLATION, or
MT in short.
19. CAT vs MT (continued)
Although these two concepts are related and
similar in some aspects, CAT and MT denote
two diametrically different processes:
In CAT, the computer program merely supports
the translator, so the translator translates the text
himself/herself, making all the essential decisions
involved.
In MT, the translator supports the machine, that
is to say: the computer (i.e. program) translates
the text, which is then edited by the translator, or,
in most cases, not edited at all.
20. CAT vs MT (continued)
Graphically represented, the difference is:
automation human involvement
Automatic
Translation/
Machine Translation
Unaided
Translation
Translation process
automated by use of
Machine Translation
Translation process
aided by electronic tools
such as (most typically)
Translation Memory
Translation process
not aided by any
electronic tools
Computer-aided
Translation (CAT)
Translation Technology Continuum
Adapted from Hutchins & Somers (1992)
21. CAT – its scope
CAT is traditionally associated with large-
scale / corporate translations:
manuals and technical documentation
software localization
“Typewriter-assisted” (i.e. traditional)
translation is usually associated with small-
scale / individual translations (done by
freelancers):
fiction books, scientific papers, etc.
22. CAT – its scope (continued)
This is notion of CAT being restricted to
corporate translation projects dates back to the
90s and is based exclusively on financial criteria:
during the early and mid 90s a combination of a
high-end computer and a high-end CAT tool cost as
much as a new car
from their very beginnings CAT tools were designed
to be capable of handling both big- and small-scale
projects, but initially no freelance translator could
afford them
23. CAT – its scope (continued)
Even for a freelance translator, CAT route is
nowadays the only possibility if one wants to
provide high-quality, 100%
terminologically consistent and efficiently
produced translations.
A testimony to that is the industry-standard
TM program Trados: Trados Freelance edition
has been the company’s best-selling TM
program for a number of years.
24. CAT tools – a bit about their history
CAT tools were developed after (very)
disappointing initial experiments with MT
tools.
So, in order to give you a proper overview of
how we got where we are now, we have to
start with the history of MT tools
25. MT History – how we switched to CAT
MT research began in 1950’s – Warren
Weaver’s 1949 Memo:
“When I look at an article in Russian, I say: This is
really written in English, but it has been coded in
some strange symbols. I will now proceed to
decode.”
(in Locke and Booth 1955:18)
26. MT History – how we switched to CAT
Initially based on some misconception about
human translation:
knowledge of two language systems suffices
it is merely a matter of looking up dictionaries
it is easy to define “a good translation”
there is only one correct translation possible
27. MT history milestones: pre-ALPAC
1954: Georgetown system demo
successful translation of 49 Russian sentences into English
1955-1966: $50m spent in 20 research centres in USA
1966: Automatic Language Processing Advisory
Committee (ALPAC) Report concludes:
”...MT is slower, less accurate and twice as expensive as Human Translation...”
“...there is no prospect of useful MT either immediately or in the future...”
MT History – how we switched to CAT
28. MT history milestones: post-ALPAC
1969 – privately funded projects
Logos system (1969); Weidner-CAT (1977); ALPS (1980)
1975 – Météo project in Canada
1976 – European Commission acquires Systran
1979 – Eurotra project in Europe for Multilingual system
1980 – PC-based system
1990 – data-driven system; WebMT
MT History – how we switched to CAT
29. 29
1975 Météo project in Canada
Automatic translation of weather forecasts (En -> Fr)
Sublanguage approach (domain-specific MT)
Most successful MT application to date
•public broadcasting since 1977
•Fr -> En available since 1989
•only 4% of output needs post-editing
•rapid translation staff turnover no longer a problem
MT History – how we switched to CAT
30. Technological factors
specifically: prevalence of PC with improved processing power
Translation market factors
official bilingualism/multilingualism create institutional
needs
globalisation creates huge commercial needs
Advances in computational linguistics
More realistic user expectations
Internet creates casual access to multilingual information
Renewed interest in MT in late 80s and early 90s:
MT History – how we switched to CAT
31. However, translations produced by MT were
still not reliable and accurate enough for
large-scale commercial applications.
So, it became evident that the human
translator cannot be eliminated and
replaced by computers.
Actually, it became obvious that computers
programs should be used as TOOLS which
only HELP the translator.
MT History – how we switched to CAT
32. History of CAT Tools
Unreliability of MT tools -> large
corporations hire translation agencies
Translations agencies find it difficult to cope
with the increasing demand
Translation agencies develop their own in-
house CAT tools
Translation agencies begin to sell their CAT
tools
33. History of CAT Tools
Two major players in the domain of CAT
tools development Trados and STAR Group
both started as:
TRANSLATION AGENCIES!!!
34. TRADOS – timeline
1990 - first version of TRADOS's main
component, MultiTerm was created for DOS
1992 -TRADOS developed the first MultiTerm
for Windows (v3.1)
1992 – TRADOS’s Translator's Workbench with
linguistic fuzzy-matching on translation
memories for DOS
1994 - TRADOS’s Translator's Workbench for
Windows
35. 1997 – BREAKTHROUGH : Microsoft
decides to base its internal localization
memory store on TRADOS
1998 – Microsoft acquires a share of 20% in
TRADOS
TRADOS – timeline (continued)
36. WHAT WE WANT TO
TEACH YOU HERE?
TWO PRACTICAL EXAMPLES OF COMMON
TRANSLATION PROBLEMS
37.
38. IMPORTANT THINGS TO NOTE:
• (quite obvious) the book has an index =
YOU (i.e. the translator) are supposed to
make it in the translated version of the
book
• a vast index = a lot of terminology
• some index terms appear on several
pages that are not necessarily in the same
chapter (e.g. pg. 36, pg. 92 and pg. 255) =
a very serious problem for the consistency
of you translation
39.
40.
41.
42.
43.
44. General principles of working with CAT tools
The main goals are EFFICIENCY and
CONSISTENCY
CAT tools = TM tools (in this case only)
The basic idea is fairly simple:
Documents, especially technical ones, contain a
large amount of content that is similar or identical
to information already contained in earlier
versions or similar documents that have been
translated before.
that applies to the source editing language (SL) as
well as the target translation languages (TL).
45. General principles of working with CAT tools
So, wouldn’t it be great to re-use previously
translated content as valuable reference material for
new translations as well so as to obtain consistency
of terminology and phrasing?
That is exactly what CAT tools do!
CAT tools make it possible for translators to work
only on content that is being created for the first
time. Existing text and text similar to existing text
is taken from the available. reference translations
(i.e. from TM= translation memory).
46. General principles of working with CAT tools
So, wouldn’t it be great to re-use previously
translated content as valuable reference material for
new translations as well so as to obtain consistency
of terminology and phrasing?
That is exactly what CAT tools do!
CAT tools make it possible for translators to work
only on content that is being created for the first
time. Existing text and text similar to existing text
is taken from the available. reference translations
(i.e. from TM= translation memory).
48. A DREAM COME
TRUE?
NOT REALLY
TO ENJOY ALL THE BENEFITS OF CAT TOOLS
FIRST YOU HAVE TO CREATE A TM AND A
TERMINOLOGY DATABASE:
•either from your old translations
•or from new translations (i.e. creating a TM from
scratch)
THAT IS WHERE OTHER CAT TOOLS (i.e. NON-
TM CAT tools) STEP IN TO SAVE THE DAY!!!
49. REUSING YOU OLD TRANSLATIONS
The best way to make a TM:
reliable source (YOU did the translation)
readily available (stored on you PC)
50. A BRIEF DIGRESSION
The term LOCALIZATION has often
popped up in previous slides
What is LOCALIZATION?
51. WHAT IS LOCALIZATION?
Localization is the process of adapting, translating and customizing a
product (software) for a specific market (for a specific locale or cultural
conventions; the locale usually determines conventions such as sort order,
keyboard layout, date, time, number and currency formats). In terms of
software localization, this means the production of interfaces that are
meaningful and comprehensible to local users.
The Localization Industry Standards Association (LISA) defines
localization as: “Localization involves taking a product and making it
linguistically and culturally appropriate to the target locale (country/region and
language) where it will be used and sold.”
Typically, this involves the translation of the user interface (the messages a
program presents to users) to enable them to create documents and data,
modify them, print them, send them by e-mail, etc.)
52. LOCALIZATION – what it includes
Focal points of internationalization and localization efforts include:
Language:
Computer-encoded text
Alphabets/scripts; different systems of numerals; left-to-right script vs. right-to-left scripts. Most recent systems
use the Unicode to solve many of these character encoding problems.
Graphical representations of text (printed materials, online images containing text)
Spoken (Audio)
Sub-titles for video
Date/time format, including use of different calendars
Formatting of numbers (decimal points, positioning of separators, character used as separator)
Time zones (UTC in internationalized environments)
Currency
Images and colors: issues of comprehensibility and cultural appropriateness
Names and titles
Government assigned numbers (such as the Social Security number in the US, National Insurance
number in the UK) and passports
Telephone numbers, addresses and international postal codes
Weights and measures
Paper sizes
Differences between local standards (e.g. YU ISO or JUS) and international standards (ISO)
53. LOCALIZATION vs. INTERNATIONALIZATION
The distinction between internationalization
and localization is subtle but important:
Internationalization is the adaptation of
products for potential use virtually everywhere,
while
localization is the addition of special features
for use in a specific locale.
The processes are complementary, and must be
combined to lead to the objective of a system that works
globally.
54. CAT tools for localization
Over the last couple of years, in addition to
general-purpose TM tools such as Trados and
Transit, translation technology companies
also developed a number of TM tools
specially designed for localization:
Alchemy CATALYST
PASSOLO
Sisulizer
56. Other CAT tools (non-TM based)
As we said earlier, computer-assisted
translation (CAT) is a broad and somewhat
imprecise term covering a range of tools, from
the fairly simple to the more complicated,
which can include:
Word processors, grammar and spell
checkers, terminology managers, eBooks,
eDictionaries, full-text search tools,
concordancers, web, TM tools, bitexts, etc.
57. CAT - REFERENCE MATERIALS
Reference materials are the primary source of
terminology in absence of translation
memory.
Computer-based reference materials can be
classified into:
Online libraries
Specialized web resources
Specialized software products
Other materials in electronic formats
58. Online Libraries
Large collections of books in electronic form,
e.g.
eBrary (new scientific books, pay site)
Internet Archive (hosting “A Million Book
Project”)
Project Gutenberg (PD fiction books, free)
Questia (popular titles – fiction and non-fiction,
pay site – some sections free)
72. COCA= Corpus Of Contemporary
American English
COCA= Corpus Of Contemporary
American English
73.
74. Specialized software products
Various programs that can be used for
terminology extraction:
Electronic dictionaries
General monolingual: e.g. OED v3
Specialized monolingual: e.g. Cambridge
Pronouncing Dictionary, Collins Collocations
Bilingual: e.g. Morton Benson, MidiDict
Electronic Bible (e.g. e-Sword)
Concordance programs (e.g. Concordancer)
Data-mining programs (e.g. Summarizer Pro)
77. Concordancers
Make it possible to see a word in context:
Useful for finding collocations and phrases
Useful for extracting terminology
Two types:
Monolingual concordancers (e.g. WordSmith)
Polylingual concordancers (e.g. ParaConc)