SlideShare une entreprise Scribd logo
1  sur  22
Foundations
Machine Translation
Post-Editing
Copyright: Welocalize, Inc. 2014. All Rights Reserved
machine.translation
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
machine.translation
• Contracts
• Patents
• Annual Reports
• Light Marketing
• Software Documentation
• Software User Interface
• SEO (Search Engine Optimization)
• e-Learning Content
• User Guides
• Internal Corporate Communications
• Wikis
• Knowledge Bases
• Proposals / Draft Applications
• User Generated Content
Different use cases
for MT
(audience?
perishability?
visibility?)
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
why.mt
For clients
– Increase throughputs and consistency
– Reduce cost of translation
– Content explosion due to Internet
– Most internet content is in English (user community is global)
– Desire to translate also “lower quality” content, such as User Generated Content
(UGC) at a profitable price
– Quality of MT has improved (new technologies, lots of research)
For the translator
– Increase throughputs and consistency
– MT is likely to become commonplace, like TMs before
– More & more clients and LSPs use MT
– Be an early-adopter
– MT and new forms of post-editing requirements are fast evolving
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
basic.concepts
MT in a nutshell
[…] Machine Translation provides a set of tools by which digital text is automatically
translated from one language (e.g. English) into another language (e.g. Spanish).
Source: Systran user guide
There are 3 main types of MT systems with different underlying logics:
 Rules-based (RBMT)
 Statistical (SMT)
 Hybrid (SMT + RBMT)
Most systems used today are either statistical or hybrid. All system types can
be customized for specific clients, incorporating client Translation
Memories, basic preferences and/or terminology lists.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
basic.concepts
Client-
specific data
TMs, glossaries
Domain-specific data
chemistry or mechanical or
IT or…
General language data
anything to“teach the system the
basics on the language pair“, so all of:
tourism, IT, automotive, literature,…
e.g. Google
Translate
and Bing
would be
Baseline only
Customizable
MT systems
(licensed or
open source)
Copyright: Welocalize, Inc. 2014. All Rights Reserved
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
basic.concepts
Understanding statistical MT
For the translator, it is important to understand that SMT systems are
based on algorithms calculating probabilities within a given set of data
(bilingual and monolingual).
In other words, the system learns from legacy human translations
(Translation Memories in our case) and calculates probabilities of most
likely translations from these, without applying linguistic rules as such.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
basic.concepts
The logic behind
statistical
machine translation
(SMT)
Imagine the TM(s) as aligned data corpus – example
Example
Terminology
The term click appears > 16 000 times in TM A
In 90% of cases it is translated with fare clic
in 10% as: selezionare, scegliere, …
The probability is high, that the machine translation will be fare clic
…BUT, maybe…
The string click OK appears 500 times in TM A
In 50% of cases it is translated with fare clic su OK
in 50% as: selezionare OK
The probability is 50%, that the machine translation will be selezionare OK
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
typical.examples
good > perfect to overall understandable and fairly fluent
medium > contains useful chunks, terms and occasionally perfect output;
more or less understandable, little fluency
poor > poor with regard to understandability and fluency
 We carry out content evaluations to prevent content with overall poor
MT output from going into production
 Medium is the broadest category and can still lead to productivity
gains when used as a basis for post-editing
The quality of raw MT output can vary. A distinction is typically made as follows:
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
typical.examples
The quality of raw MT output can vary.
Example:
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
typical.examples
Know the patterns of MT output
Even ”good” MT output is not expected to be perfect. Depending on the underlying
MT logic and the language pair, there tend to be typical issues to fix, e.g.:
– issues around capitalization
– punctuation (source punctuation is copied)
– spacing
– omissions/additions of text (usually different in nature to those in fuzzy matches)
– unknown/new words may be translated literally or be left in English
– word order: can be mirroring the source
– compound formation
– word form agreement
→ being aware of typical issues helps good post-editing
Copyright: Welocalize, Inc. 2014. All Rights Reserved
typical.examples
Copyright: Welocalize, Inc. 2014. All Rights Reserved
typical.examples
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
post.editing
What is Post-Editing?
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
post.editing
In other words…
 Make changes where necessary, using as much of the MT output as possible
(based on language and client requirements)
 Read the MT output & the source > decide quickly what can be used
 Use as many “bits/sections“ of the MT output as possible:
move them around, correct word forms, change the part of speech, use them as
inspiration
 Look up key terms in your reference material as usual, but also learn to trust the
customized output
 Automate with customized QA checks
Adjust your expectations. Rethink your approach. Report recurring errors.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
full.post.editing
full post-editing: “publishable quality”
► Client Glossary, TM, Style Guide and others apply
Examples:
 infinitive / imperative preferences?
 passive / impassive preferences?
 formal / informal preferences?
 different styles for headers, lists, tables?
 special formatting of UI options? (bilingual, English)
 are measurements to be converted?
 Terminology
If the client requests “full post-editing”, this means publishable quality.
The post-editor is responsible for ensuring the client requirements with regard
to final quality expectations are met.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
light.post.editing
light post-editing / “understandable quality”
Full Post-Editing Light Post-Editing
Grammar and spell-checking are correct Minor issues in grammar (and spelling) are
acceptable
Terminology is accurate & consistent Terminology is understandable and
actionable
Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable
Style is consistent (headers, list items,…) Style variations are acceptable
Punctuation is correct Variations/errors in punctuation are
acceptable
Style & tone are appropriate for content Style & tone are not offensive
Specific requirements: 33 cm (13‘‘); change
EN quotation marks to FR/DE/….
Follow MT output, e.g. keep proposed
number format 13‘‘ (33cm), English
quotation marks,...
… …
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
post.editing
light post-editing versus full post-editing
*Copyright CSA
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Image © Common Sense Advisory, “Post-Edited machine translation defined”, April 30, 2013
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
post.editing
Notes on productivity
Just as with human translation, throughput can vary and depends on:
– language pair
– content type & complexity
– experience
– domain knowledge
– quality requirements
– use of automatic QA tools
– quality of TM and reference material
With MT, additional factors are:
– quality of the MT
– experience with post-editing
Compared to average daily throughputs for human translation, average daily
throughputs for full post-editing can be up to 3 x higher.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
take.aways
 There are different use-cases of MT associated with different levels of
final (post-edited) quality
 When full PE is requested, this means publishable quality
 There are different MT systems, Welocalize works with a range of them
 MT output varies in quality, we evaluate it with our translation partners
to ensure the necessary quality for post-editing is met
 MT is not expected to be perfect, that„s why we need post-editors!
 Post-editing replaces the translation stage in the workflow, but it is a
different task, cognitively
 MT systems can improve through adding more data & through
constructive feedback from post-editors
Copyright: Welocalize, Inc. 2014. All Rights Reserved
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
trademark.disclaimer:
Product names, logos, brands and other trademarks referenced within this
presentation are the property of their respective trademark holders. These
trademark holders are not owned or affiliated to Welocalize, Inc., our
products, or our website. They do not sponsor or endorse our materials.
Reference is for education purposes only.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Questions?
Contact the Welocalize Language Tools Team
lena.marg@welocalize.com, elaine.ocurran@welocalize.com
Welocalize
Frederick, Maryland - Headquarters
241 East 4th St. Suite 207
Frederick, Maryland 21701 USA
[t] +1.301.668.0330
[t] +1.800.370.9515 Toll Free
www.welocalize.com
Copyright: Welocalize, Inc. 2014. All Rights Reserved

Contenu connexe

Plus de Welocalize

Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Welocalize
 
MT and Post-Editing User-Generated Content AMTA 2014
MT and Post-Editing User-Generated Content AMTA 2014MT and Post-Editing User-Generated Content AMTA 2014
MT and Post-Editing User-Generated Content AMTA 2014Welocalize
 
Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...
 Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA... Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...
Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...Welocalize
 
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalizeContent Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalizeWelocalize
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize
 
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014Welocalize
 
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...Welocalize
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Welocalize
 
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga BeregovayaBeyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga BeregovayaWelocalize
 
Better translations through automated source and post edit analysis
Better translations through automated source and post edit analysisBetter translations through automated source and post edit analysis
Better translations through automated source and post edit analysisWelocalize
 
2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology 2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology Welocalize
 
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...Welocalize
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...Welocalize
 
An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013Welocalize
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingWelocalize
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargWelocalize
 

Plus de Welocalize (16)

Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014
 
MT and Post-Editing User-Generated Content AMTA 2014
MT and Post-Editing User-Generated Content AMTA 2014MT and Post-Editing User-Generated Content AMTA 2014
MT and Post-Editing User-Generated Content AMTA 2014
 
Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...
 Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA... Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...
Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...
 
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalizeContent Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
 
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
 
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
 
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga BeregovayaBeyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
 
Better translations through automated source and post edit analysis
Better translations through automated source and post edit analysisBetter translations through automated source and post edit analysis
Better translations through automated source and post edit analysis
 
2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology 2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology
 
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 
An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
 

Dernier

Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture conceptP&CO
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...lizamodels9
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...rajveerescorts2022
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLkapoorjyoti4444
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756dollysharma2066
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 

Dernier (20)

Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 

Welocalize Machine Translation Post Editing Basics Course I

  • 3. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. machine.translation • Contracts • Patents • Annual Reports • Light Marketing • Software Documentation • Software User Interface • SEO (Search Engine Optimization) • e-Learning Content • User Guides • Internal Corporate Communications • Wikis • Knowledge Bases • Proposals / Draft Applications • User Generated Content Different use cases for MT (audience? perishability? visibility?) Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 4. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. why.mt For clients – Increase throughputs and consistency – Reduce cost of translation – Content explosion due to Internet – Most internet content is in English (user community is global) – Desire to translate also “lower quality” content, such as User Generated Content (UGC) at a profitable price – Quality of MT has improved (new technologies, lots of research) For the translator – Increase throughputs and consistency – MT is likely to become commonplace, like TMs before – More & more clients and LSPs use MT – Be an early-adopter – MT and new forms of post-editing requirements are fast evolving Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 5. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. basic.concepts MT in a nutshell […] Machine Translation provides a set of tools by which digital text is automatically translated from one language (e.g. English) into another language (e.g. Spanish). Source: Systran user guide There are 3 main types of MT systems with different underlying logics:  Rules-based (RBMT)  Statistical (SMT)  Hybrid (SMT + RBMT) Most systems used today are either statistical or hybrid. All system types can be customized for specific clients, incorporating client Translation Memories, basic preferences and/or terminology lists. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 6. basic.concepts Client- specific data TMs, glossaries Domain-specific data chemistry or mechanical or IT or… General language data anything to“teach the system the basics on the language pair“, so all of: tourism, IT, automotive, literature,… e.g. Google Translate and Bing would be Baseline only Customizable MT systems (licensed or open source) Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 7. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. basic.concepts Understanding statistical MT For the translator, it is important to understand that SMT systems are based on algorithms calculating probabilities within a given set of data (bilingual and monolingual). In other words, the system learns from legacy human translations (Translation Memories in our case) and calculates probabilities of most likely translations from these, without applying linguistic rules as such. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 8. basic.concepts The logic behind statistical machine translation (SMT) Imagine the TM(s) as aligned data corpus – example Example Terminology The term click appears > 16 000 times in TM A In 90% of cases it is translated with fare clic in 10% as: selezionare, scegliere, … The probability is high, that the machine translation will be fare clic …BUT, maybe… The string click OK appears 500 times in TM A In 50% of cases it is translated with fare clic su OK in 50% as: selezionare OK The probability is 50%, that the machine translation will be selezionare OK Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 9. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples good > perfect to overall understandable and fairly fluent medium > contains useful chunks, terms and occasionally perfect output; more or less understandable, little fluency poor > poor with regard to understandability and fluency  We carry out content evaluations to prevent content with overall poor MT output from going into production  Medium is the broadest category and can still lead to productivity gains when used as a basis for post-editing The quality of raw MT output can vary. A distinction is typically made as follows: Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 10. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples The quality of raw MT output can vary. Example: Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 11. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples Know the patterns of MT output Even ”good” MT output is not expected to be perfect. Depending on the underlying MT logic and the language pair, there tend to be typical issues to fix, e.g.: – issues around capitalization – punctuation (source punctuation is copied) – spacing – omissions/additions of text (usually different in nature to those in fuzzy matches) – unknown/new words may be translated literally or be left in English – word order: can be mirroring the source – compound formation – word form agreement → being aware of typical issues helps good post-editing Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 12. typical.examples Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 13. typical.examples Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 14. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing What is Post-Editing? Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 15. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing In other words…  Make changes where necessary, using as much of the MT output as possible (based on language and client requirements)  Read the MT output & the source > decide quickly what can be used  Use as many “bits/sections“ of the MT output as possible: move them around, correct word forms, change the part of speech, use them as inspiration  Look up key terms in your reference material as usual, but also learn to trust the customized output  Automate with customized QA checks Adjust your expectations. Rethink your approach. Report recurring errors. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 16. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. full.post.editing full post-editing: “publishable quality” ► Client Glossary, TM, Style Guide and others apply Examples:  infinitive / imperative preferences?  passive / impassive preferences?  formal / informal preferences?  different styles for headers, lists, tables?  special formatting of UI options? (bilingual, English)  are measurements to be converted?  Terminology If the client requests “full post-editing”, this means publishable quality. The post-editor is responsible for ensuring the client requirements with regard to final quality expectations are met. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 17. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. light.post.editing light post-editing / “understandable quality” Full Post-Editing Light Post-Editing Grammar and spell-checking are correct Minor issues in grammar (and spelling) are acceptable Terminology is accurate & consistent Terminology is understandable and actionable Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable Style is consistent (headers, list items,…) Style variations are acceptable Punctuation is correct Variations/errors in punctuation are acceptable Style & tone are appropriate for content Style & tone are not offensive Specific requirements: 33 cm (13‘‘); change EN quotation marks to FR/DE/…. Follow MT output, e.g. keep proposed number format 13‘‘ (33cm), English quotation marks,... … … Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 18. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing light post-editing versus full post-editing *Copyright CSA Copyright: Welocalize, Inc. 2014. All Rights Reserved Image © Common Sense Advisory, “Post-Edited machine translation defined”, April 30, 2013
  • 19. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing Notes on productivity Just as with human translation, throughput can vary and depends on: – language pair – content type & complexity – experience – domain knowledge – quality requirements – use of automatic QA tools – quality of TM and reference material With MT, additional factors are: – quality of the MT – experience with post-editing Compared to average daily throughputs for human translation, average daily throughputs for full post-editing can be up to 3 x higher. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 20. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. take.aways  There are different use-cases of MT associated with different levels of final (post-edited) quality  When full PE is requested, this means publishable quality  There are different MT systems, Welocalize works with a range of them  MT output varies in quality, we evaluate it with our translation partners to ensure the necessary quality for post-editing is met  MT is not expected to be perfect, that„s why we need post-editors!  Post-editing replaces the translation stage in the workflow, but it is a different task, cognitively  MT systems can improve through adding more data & through constructive feedback from post-editors Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 21. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. trademark.disclaimer: Product names, logos, brands and other trademarks referenced within this presentation are the property of their respective trademark holders. These trademark holders are not owned or affiliated to Welocalize, Inc., our products, or our website. They do not sponsor or endorse our materials. Reference is for education purposes only. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 22. Questions? Contact the Welocalize Language Tools Team lena.marg@welocalize.com, elaine.ocurran@welocalize.com Welocalize Frederick, Maryland - Headquarters 241 East 4th St. Suite 207 Frederick, Maryland 21701 USA [t] +1.301.668.0330 [t] +1.800.370.9515 Toll Free www.welocalize.com Copyright: Welocalize, Inc. 2014. All Rights Reserved

Notes de l'éditeur

  1. This one is ok to leave as is as it is a list of many IT companies’ names, doesn’t really point at a specific client.
  2. Same as Dell