SlideShare une entreprise Scribd logo
1  sur  33
Thinking Strategically
About Content
Destined for Machine
Translation
Val Swisher
Founder & CEO
@contentrulesinc

© 2013. Content Rules, Inc. All rights reserved.
Who Am I?

 Founder and CEO of Content Rules
 25+ years in content arena
 Specialty areas:
 Global content strategy
 Terminology management
 Content quality
 Single-sourcing / XML / DITA
 Finishing third book, “Global Content Strategy,” due out in
2014
© 2013. Content Rules, Inc. All rights reserved.
What is Content Rules?
 Professional services firm specializing in:
• Content strategy / Global content strategy
• Content creation
• Content quality / Global readiness






Based in Silicon Valley
Founded in 1994
Acrolinx Authorized Services Provider
Authorized provider of The Rockley Strategic Method™

© 2013. Content Rules, Inc. All rights reserved.
Global Readiness
 Ensure content is translatable




Readability
Grammar and style
Reuse

 Evaluate and improve content quality using state-of-the-art tools





Reports
Metrics
Recommendations
Fixes

 Save money on translation

© 2013. Content Rules, Inc. All rights reserved.
© 2013. Content Rules, Inc. All rights reserved.
Today’s Presentation







Importance of content
Historic background
Types of machine translation
Content quality affects machine translation results
Bleu scores
Pre-editing instead of post-editing

© 2013. Content Rules, Inc. All rights reserved.
Content Is Important

87% of respondents to a recent CMO Council survey said that content had
a moderate to major impact on their buying decisions
© 2013. Content Rules, Inc. All rights reserved.
Content Is A Strategic Asset

© 2013.

Content Rules, Inc.
All rights reserved.
What Does It Mean to be Strategic?
stra·te·gic
[struh-tee-jik]
adjective
1. pertaining to, characterized by, or of the
nature of strategy: strategic movements.
2. important in or essential to strategy.
3. forming an integral part of a stratagem:
a strategic move in a game of chess.

© 2013.

Content Rules, Inc.
All rights reserved.
Content Creation In the Past

 Content wasn't so easy to create and distribute
 Created by trained professionals
 Only they had access to the content

© 2013.

Content Rules, Inc.
All rights reserved.
Content Creation Today

 Everyone creates content

 Very easy to distribute
 Now, we have loads and loads of content
• Some of it good
• Some of it mediocre
• Some of it downright awful

© 2013.

Content Rules, Inc.
All rights reserved.
Translation In The Past

 Content wasn't so easy to translate.

 Trained professionals
 Only they understood multiple
languages well enough to translate
content

© 2013.

Content Rules, Inc.
All rights reserved.
Translation Today
 It is easy and free to translate content
 We have loads and loads of translated content
•

Some of it good

•

Some of it mediocre

•

Some of it downright awful

© 2013.

Content Rules, Inc.
All rights reserved.
More Machine Translation All The Time
 Machine Translation (MT) is becoming more relied upon as a way to
get cost-effective, fast translations
 %18.05 year-over-year growth of MT expected over next 3 years*
 Must pay a more attention to the source content that goes into it
 A machine cannot figure what we meant to say based on what we
actually wrote
 Garbage In – Garbage Out

*http://www.researchandmarkets.com/research/2gpj3p/global_machine
© 2013.

Content Rules, Inc.
All rights reserved.
Source Content And Machine Translation

 Types of MT engines and the effect of source content
on them
 What are Bleu scores
 How quality of content affects MT output

© 2013.

Content Rules, Inc.
All rights reserved.
MT Engine Types

There are three types of MT Engines:
1. Rule-based
2. Statistical
3. Hybrid

© 2013.

Content Rules, Inc.
All rights reserved.
Rule-Based MT (RBMT)






Uses linguistic rules
Extensive use of bilingual dictionaries
Transfers structure of source language into target language
Results are literal translations based on rules
Does not handle ambiguity well (word or phrase having more than
one meaning)

© 2013.

Content Rules, Inc.
All rights reserved.
Statistical MT (SMT)







Based on analysis of content
Engine trained over time
More content = better results
Need at least 2,000,000 million words per domain
Better quality content = better results
Results are more natural translations, based on previous source |
destination pairs
 Google Translate

© 2013.

Content Rules, Inc.
All rights reserved.
Hybrid






Combines rule-base and statistical
Provides predictability and consistency of RBMT
Provides fluency and flexibility of SMT
Reduces the amount of data needed to train the engine

© 2013.

Content Rules, Inc.
All rights reserved.
Training The SMT Beast

 Training SMT software extremely important
 Poor quality source = Poor quality translations
 Some companies have such poorly trained MT engines
that fixing the content first is actually not an option
 The engine has been trained to translate poor quality
source

© 2013.

Content Rules, Inc.
All rights reserved.
The Effect Of Poor Content On SMT And
Hybrid MT

 Poor or unpredictable translations
 Increased time to retrain the system with correct
information
 Increased post-editing, per language
 Wasted money

© 2013.

Content Rules, Inc.
All rights reserved.
Evaluating MT Precision - Bleu Scores

 Introduced in 2002 by the IBM Watson Research Center
 Automatic evaluation metric used to compare MT output
with reference human translation
“The closer a machine translation is to a professional human translation, the
better it is.” *

 Metric widely used throughout the industry
*http://acl.ldc.upenn.edu/P/P02/P02-1040.pdf
© 2013.

Content Rules, Inc.
All rights reserved.
Bleu Scores – Helpful Or Hype?
According to Callison-Burch, Osborne, and Koehn of the School on
Informatics, University of Edinburgh, Bleu scores have many issues*:
 Synonyms and paraphrases difficult to score
 All words are weighted equally
 Difficult to calculate

*http://homepages.inf.ed.ac.uk/pkoehn/publications/bleu2006.pdf

© 2013.

Content Rules, Inc.
All rights reserved.
That’s Okay. We Can Post Edit.

Original
Source
Content
Post-Edited
Translations

© 2013.

Content Rules, Inc.
All rights reserved.
Why Not Pre-Edit Instead?






Fewer issues = less post editing
Save time
Save money
Improve quality

© 2013.

Content Rules, Inc.
All rights reserved.
Create Global-Ready Content







Reduce word count
Standardize terminology
Enforce correct grammar
Eliminate jargon and colloquialisms
Increase reuse

© 2013.

Content Rules, Inc.
All rights reserved.
Results of Pre-Editing







Save money
Improve quality
Faster time to market
Fewer in-country iterations
Better translation consistency

© 2013.

Content Rules, Inc.
All rights reserved.
Summary







Content is a strategic asset
Machine translation is becoming more popular
Poor quality content incorrectly trains MT engines
Poor quality content results in increased post-editing
Pre-editing saves money and time, and improves
translation quality

© 2013.

Content Rules, Inc.
All rights reserved.
Val Swisher
vals@contentrules.com
@contentrulesinc
© 2013.

Content Rules, Inc.
All rights reserved.
Val Swisher
CEO & Founder
vals@contentrules.com
@contentrulesinc
© 2013.

Content Rules, Inc.
All rights reserved.
Reduce word count

 We recommend 24 words, max, for machine translation.
 It is impossible for people to understand long sentences.
Imagine software having to parse through all of those
commas (half of which are probably missing or misplaced).

© 2013.

Content Rules, Inc.
All rights reserved.
Let's say we have 100,000 words of source content.
We are going to translate the content into 14 languages.
We will end up with 1.4 million words of content.
Let's say the 100,000 words contain all types of errors. We will have to post-edit and fix
1.4 million words on the other side.
Let's say we have to pay someone <<<$ .xx>>> per word to post-edit the content.
That's <<<$.xx>>> * 1,400,000 words.
If we paid <<<$ .07>>> per word to predit the content, we would have spent $7,000 for
preparation.
© 2013.

Content Rules, Inc.
All rights reserved.

Contenu connexe

Dernier

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Thinking Strategically About Content Destined for Machine Translation

  • 1. Thinking Strategically About Content Destined for Machine Translation Val Swisher Founder & CEO @contentrulesinc © 2013. Content Rules, Inc. All rights reserved.
  • 2. Who Am I?  Founder and CEO of Content Rules  25+ years in content arena  Specialty areas:  Global content strategy  Terminology management  Content quality  Single-sourcing / XML / DITA  Finishing third book, “Global Content Strategy,” due out in 2014 © 2013. Content Rules, Inc. All rights reserved.
  • 3. What is Content Rules?  Professional services firm specializing in: • Content strategy / Global content strategy • Content creation • Content quality / Global readiness     Based in Silicon Valley Founded in 1994 Acrolinx Authorized Services Provider Authorized provider of The Rockley Strategic Method™ © 2013. Content Rules, Inc. All rights reserved.
  • 4. Global Readiness  Ensure content is translatable    Readability Grammar and style Reuse  Evaluate and improve content quality using state-of-the-art tools     Reports Metrics Recommendations Fixes  Save money on translation © 2013. Content Rules, Inc. All rights reserved.
  • 5. © 2013. Content Rules, Inc. All rights reserved.
  • 6. Today’s Presentation       Importance of content Historic background Types of machine translation Content quality affects machine translation results Bleu scores Pre-editing instead of post-editing © 2013. Content Rules, Inc. All rights reserved.
  • 7. Content Is Important 87% of respondents to a recent CMO Council survey said that content had a moderate to major impact on their buying decisions © 2013. Content Rules, Inc. All rights reserved.
  • 8. Content Is A Strategic Asset © 2013. Content Rules, Inc. All rights reserved.
  • 9. What Does It Mean to be Strategic? stra·te·gic [struh-tee-jik] adjective 1. pertaining to, characterized by, or of the nature of strategy: strategic movements. 2. important in or essential to strategy. 3. forming an integral part of a stratagem: a strategic move in a game of chess. © 2013. Content Rules, Inc. All rights reserved.
  • 10. Content Creation In the Past  Content wasn't so easy to create and distribute  Created by trained professionals  Only they had access to the content © 2013. Content Rules, Inc. All rights reserved.
  • 11. Content Creation Today  Everyone creates content  Very easy to distribute  Now, we have loads and loads of content • Some of it good • Some of it mediocre • Some of it downright awful © 2013. Content Rules, Inc. All rights reserved.
  • 12. Translation In The Past  Content wasn't so easy to translate.  Trained professionals  Only they understood multiple languages well enough to translate content © 2013. Content Rules, Inc. All rights reserved.
  • 13. Translation Today  It is easy and free to translate content  We have loads and loads of translated content • Some of it good • Some of it mediocre • Some of it downright awful © 2013. Content Rules, Inc. All rights reserved.
  • 14. More Machine Translation All The Time  Machine Translation (MT) is becoming more relied upon as a way to get cost-effective, fast translations  %18.05 year-over-year growth of MT expected over next 3 years*  Must pay a more attention to the source content that goes into it  A machine cannot figure what we meant to say based on what we actually wrote  Garbage In – Garbage Out *http://www.researchandmarkets.com/research/2gpj3p/global_machine © 2013. Content Rules, Inc. All rights reserved.
  • 15. Source Content And Machine Translation  Types of MT engines and the effect of source content on them  What are Bleu scores  How quality of content affects MT output © 2013. Content Rules, Inc. All rights reserved.
  • 16. MT Engine Types There are three types of MT Engines: 1. Rule-based 2. Statistical 3. Hybrid © 2013. Content Rules, Inc. All rights reserved.
  • 17. Rule-Based MT (RBMT)      Uses linguistic rules Extensive use of bilingual dictionaries Transfers structure of source language into target language Results are literal translations based on rules Does not handle ambiguity well (word or phrase having more than one meaning) © 2013. Content Rules, Inc. All rights reserved.
  • 18. Statistical MT (SMT)       Based on analysis of content Engine trained over time More content = better results Need at least 2,000,000 million words per domain Better quality content = better results Results are more natural translations, based on previous source | destination pairs  Google Translate © 2013. Content Rules, Inc. All rights reserved.
  • 19. Hybrid     Combines rule-base and statistical Provides predictability and consistency of RBMT Provides fluency and flexibility of SMT Reduces the amount of data needed to train the engine © 2013. Content Rules, Inc. All rights reserved.
  • 20. Training The SMT Beast  Training SMT software extremely important  Poor quality source = Poor quality translations  Some companies have such poorly trained MT engines that fixing the content first is actually not an option  The engine has been trained to translate poor quality source © 2013. Content Rules, Inc. All rights reserved.
  • 21. The Effect Of Poor Content On SMT And Hybrid MT  Poor or unpredictable translations  Increased time to retrain the system with correct information  Increased post-editing, per language  Wasted money © 2013. Content Rules, Inc. All rights reserved.
  • 22. Evaluating MT Precision - Bleu Scores  Introduced in 2002 by the IBM Watson Research Center  Automatic evaluation metric used to compare MT output with reference human translation “The closer a machine translation is to a professional human translation, the better it is.” *  Metric widely used throughout the industry *http://acl.ldc.upenn.edu/P/P02/P02-1040.pdf © 2013. Content Rules, Inc. All rights reserved.
  • 23. Bleu Scores – Helpful Or Hype? According to Callison-Burch, Osborne, and Koehn of the School on Informatics, University of Edinburgh, Bleu scores have many issues*:  Synonyms and paraphrases difficult to score  All words are weighted equally  Difficult to calculate *http://homepages.inf.ed.ac.uk/pkoehn/publications/bleu2006.pdf © 2013. Content Rules, Inc. All rights reserved.
  • 24. That’s Okay. We Can Post Edit. Original Source Content Post-Edited Translations © 2013. Content Rules, Inc. All rights reserved.
  • 25. Why Not Pre-Edit Instead?     Fewer issues = less post editing Save time Save money Improve quality © 2013. Content Rules, Inc. All rights reserved.
  • 26. Create Global-Ready Content      Reduce word count Standardize terminology Enforce correct grammar Eliminate jargon and colloquialisms Increase reuse © 2013. Content Rules, Inc. All rights reserved.
  • 27. Results of Pre-Editing      Save money Improve quality Faster time to market Fewer in-country iterations Better translation consistency © 2013. Content Rules, Inc. All rights reserved.
  • 28. Summary      Content is a strategic asset Machine translation is becoming more popular Poor quality content incorrectly trains MT engines Poor quality content results in increased post-editing Pre-editing saves money and time, and improves translation quality © 2013. Content Rules, Inc. All rights reserved.
  • 30. Val Swisher CEO & Founder vals@contentrules.com @contentrulesinc
  • 31. © 2013. Content Rules, Inc. All rights reserved.
  • 32. Reduce word count  We recommend 24 words, max, for machine translation.  It is impossible for people to understand long sentences. Imagine software having to parse through all of those commas (half of which are probably missing or misplaced). © 2013. Content Rules, Inc. All rights reserved.
  • 33. Let's say we have 100,000 words of source content. We are going to translate the content into 14 languages. We will end up with 1.4 million words of content. Let's say the 100,000 words contain all types of errors. We will have to post-edit and fix 1.4 million words on the other side. Let's say we have to pay someone <<<$ .xx>>> per word to post-edit the content. That's <<<$.xx>>> * 1,400,000 words. If we paid <<<$ .07>>> per word to predit the content, we would have spent $7,000 for preparation. © 2013. Content Rules, Inc. All rights reserved.

Notes de l'éditeur

  1. According to the CMO Council and Netline in their June 2013 survey, “Understanding How BtoB Buyers Source, Value, and Share Content Online,” 87% of respondents stated that content had a moderate to major impact on their buying decisions.For far too long, content has been treated as something that simply describes, positions, or touts a product. Technical content, in particular, has long been an after thought, something not deemed important. If we don&apos;t treat content as a strategic asset, it is just garbage. And if we put garbage into machine translation, it is just exponentiated garbage.
  2. One poorly written source document = many poorly translated resulting documentsPoor source content = more post editing Problems exponentiate based on number of language pairs