SlideShare une entreprise Scribd logo
1  sur  42
No Hardware. No Software. No Hassle MT.
With KantanMT

3 Steps to Improve Quality
What we aim to cover today?
How to improve the quality of your MT
Engine?
 A Build – Measure – Learn process
 How do we measure and quantify Quality
in MT?
 Practical illustrations throughout of
KantanMT in action
 Questions & Answers


3 Steps to Improve Quality
3-Steps to Higher Quality
•

Evolutionary Process
Not a once off step
• Continuous improvement loop
• Incremental Improvements over
time
•

•

GIGO
•

•

Build
Build
Build

Kantan
MT
Engine

Garbage In => Garbage Out

Production Engine
At least three iterations
• Experimentation with different
inputs
• Measurements of different outputs
• Control your own destiny
•

Learn

Measure

3 Steps to Improve Quality
Build – Building quality training streams
•

Training Data
•
•

•

Bad Training Data
•

•

How KantanMT learns to translate
Mimic your style, terminology, fluency
Garbage In => Garbage Out

Three main factors:




Quality
Relevance to domain
Quantity

3 Steps to Improve Quality
Build – Building quality training streams


Training Data - Three main factors:
Quality


The linguistic quality of the training material is crucially
important

Relevance to domain



A high quality MT system has good domain knowledge
Similar to the way you’ve always worked with Translation
Memories and CAT tools

Quantity


The more training data you use to build your engine the
better its capacity to generate translations that mimic
your translation style and terminology

3 Steps to Improve Quality
Build – Building quality training streams
Balancing the equation

Quality

•

3 Steps to Improve Quality
Build – is Quantity important?
Not if quality is good – it’s a balancing act!

Quality



3 Steps to Improve Quality
Build – Building quality training streams


Quality Training Data - Suitable Sources
* KantanMT Stock Engines

Language
Base Data

* Translation Memories

Translation
sources

* Other Translation Memories

Domain
Base Data

Monolingual
Data

Training Data

* Monolingual (target only)
data

3 Steps to Improve Quality
Build – Building quality training streams


Advantages of clean, high-quality training data
Less correction of errors
 Finding cause of errors is
easier
 Easy to fill gaps
 Faster processing time


Large volume of dirty
make correction difficult
 Finding root cause of
problem challenging
 Slower training and
processing time


3 Steps to Improve Quality
Build – in KantanMT

3 Steps to Improve Quality
Build – Measure - Learn
Build

Kantan
MT
Engine

Learn

Measure
Measure

3 Steps to Improve Quality
Measure – KantanMT engine calibration


What to Measure?

BLEU

F-Measure

Word-counts

TER

3 Steps to Improve Quality
Measure – KantanMT engine calibration


BLEU Score







Scoring system developed to automate this process of
evaluation
Internationally recognised and most widely used
measure of the quality of your MT engine
The BLEU metric scores a translation on a scale of 0 to
100%
The closer to 100%, the more the translation
correlates to a human translation
AIM: HIGH

3 Steps to Improve Quality
Measure – KantanMT engine calibration


F-Measure Score







F-Measure is an automated measurement to
determine the precision and recall capabilities
A general guide to determine the overall quality
performance of an engine
Ratio between recall and precision measurements
Displayed as a percentage value on a scale of 0 to
100%
AIM: HIGH

3 Steps to Improve Quality
Measure – KantanMT engine calibration


TER Score
A method to help in predict the post-editing effort
 TER is quick to use and correlates highly with actual
post-editing effort
 A TER score is a value in the range of 0 to 100%
 AIM: LOW


3 Steps to Improve Quality
Measure – KantanMT engine calibration


Word-counts




At least 1.5-2.0 million words to build a
predictable, quality KantanMT engine
Less than 2m words - then the engine has to be
used only in a narrow field-domain
Wide field-domain engine – then you would
need in the order of 10-15m words of training
data

3 Steps to Improve Quality
Measure – KantanMT engine calibration


Track your scores using KantanWatch™

3 Steps to Improve Quality
Measure – KantanMT engine calibration


Compare Engine Scores & Performance

3 Steps to Improve Quality
Build – Measure – Learn
Build

Kantan
MT
Engine

Learn
Learn

Measure
Measure

3 Steps to Improve Quality
Learn – KantanMT Experimentation

3 Steps to Improve Quality
Learn – KantanMT Experimentation

3 Steps to Improve Quality
Learn – KantanMT Experimentation

3 Steps to Improve Quality
Learn – KantanMT Experimentation


Running and learning from your first translation job
BLEU
24%

F-Measure
50%

TER
66%

Wordcount
172K

3 Steps to Improve Quality
Learn – KantanMT Experimentation


Learn from examining the output
Low



High

Low

Catalog Errors







OK

Untranslated text
Incorrect numeric
formatting
Invalid characters
High level of post-editing
required

Conclusions






Engine coverage is bad
due to low wordcount
Post-Editing is high due
to low engine coverage
Training data doesn’t
contain correct numeric
formatting
Bad formatting in
training data

3 Steps to Improve Quality
Learn – KantanMT Experimentation


Learn from examining the output
Low



OK

High

Low

Action Plan








Coverage – More training
data required, relevant and
of high quality. Also use a
Glossary File to improve
terminology consistency
and accuracy.
Numeric Formatting – Use
PEX rule to post-edit
translation and fix numeric
formats
Invalid Character – Use
PEX rule to fix this invalid
character issue
Post-Editing – By
increasing the quantity of
training data the KantanMT
engine will perform better
overall

3 Steps to Improve Quality
Build – Action Plan


Action Plan




Coverage – More training
data required, relevant and
of high quality
Post-Editing – By
increasing the quantity of
training data the KantanMT
engine will perform better
overall

3 Steps to Improve Quality
Measure – Action Plan


Your latest scores are…

3 Steps to Improve Quality
Measure – Action Plan


Results using more relevant, high quality Training Data
BLEU

F-Measure

64%
Excellent



TER

63%

33%

Very Good

Very Good

Wordcount
479K
Good

Previously…

Low

OK

High

Low

3 Steps to Improve Quality
Learn/Build – Action Plan


Customise your Engine – Runtime customisation improves Quality too!

PEX Rules
Kantan
MT
Engine

Higher Quality
Machine Translation

Reduced PostEditing

TBX Files

3 Steps to Improve Quality
Learn/Build – Action Plan


Action Plan






Coverage – Use a Glossary
File to improve
terminology consistency
and accuracy
Numeric Formatting – Use
PEX rule to post-edit
translation and fix numeric
formats
Invalid Character – Use
PEX rule to fix this invalid
character issue

3 Steps to Improve Quality
Learn/Build – Action Plan

PEX file

Original output


Action Plan






Coverage – Use a Glossary
File to improve
terminology consistency
and accuracy
Numeric Formatting – Use
PEX rule to post-edit
translation and fix numeric
formats
Invalid Character – Use
PEX rule to fix this invalid
character issue

3 Steps to Improve Quality
Build – Measure – Learn – The Results


Analyse output

Untranslated text
 Numeric Formatting
 Invalid Character


IMPROVED QUALITY

3 Steps to Improve Quality
Build – Measure – Learn
Build

Kantan
MT
Engine

Learn
Learn

 Human Post-Editing as part
of the Learn step
 Take the KantanMT output
 Post-Edit it by a Linguist
 Re-build the KantanMT
Engine
 Rapidly improves Quality of
your KantanMT Engine
Measure

3 Steps to Improve Quality
Learn – Action Plan


Post-Editing Feedback - Rapidly improves your KantanMT Engine.

Kantan
MT
Engine

XLIFF, TMX
Machine Translation

Post-Editing

Kantan
MT
Engine

Higher Quality
Rebuild KantanMT Engine

Finalised
Publishable
XLIFF, TMX

3 Steps to Improve Quality
Summary – Build-Measure-Learn

3 Steps to Improve Quality
Summary – Build-Measure-Learn

3 Steps to Improve Quality
Summary – Build-Measure-Learn
Who can do this?

YOU CAN!
3 Steps to Improve Quality
Summary – Build-Measure-Learn


You as a LSP or Language Professional provide:
Extensive Language expertise
 Skills to ensure accuracy and precision of your translation
 Management / maintenance of TM’s for your clients for use in your CAT
tools




KantanMT provides:


Software and the Hardware to Build your engines



Quality metrics to Measure the quality of your engine

Tools and Process to Learn and then teach your engine
 Support and Help


3 Steps to Improve Quality
Summary – Build-Measure-Learn
Follow this Build – Measure – Learn process
 KantanMT will increase Productivity
 Process more words per hour per day
 Net result?


Higher Earnings
 More Income
 Better Margins


3 Steps to Improve Quality
Questions & Answers

Thank you!
3 Steps to Improve Quality
Additional information
For additional information please visit:
http://www.kantanmt.com

Contact me at:
Kevin McCoy
E-mail: kevinmcc@kantanmt.com
Mobile: +353 86 823 1527
3 Steps to Improve Quality

Contenu connexe

En vedette

Las Doctrinas Éticas
Las Doctrinas ÉticasLas Doctrinas Éticas
Las Doctrinas ÉticasEli Tene
 
Movilidad laboral en la UE: una verdad incómoda
Movilidad laboral en la UE: una verdad incómodaMovilidad laboral en la UE: una verdad incómoda
Movilidad laboral en la UE: una verdad incómodaOfizios
 
Propuestas de regulación de los medios de comunicación: Ley de Radiodifusión ...
Propuestas de regulación de los medios de comunicación: Ley de Radiodifusión ...Propuestas de regulación de los medios de comunicación: Ley de Radiodifusión ...
Propuestas de regulación de los medios de comunicación: Ley de Radiodifusión ...FUSADES
 
LAUREN SHORT IDVL PROJECT
LAUREN SHORT IDVL PROJECTLAUREN SHORT IDVL PROJECT
LAUREN SHORT IDVL PROJECTLauren Short
 
Diapositivas final electiva vi
Diapositivas final electiva viDiapositivas final electiva vi
Diapositivas final electiva viZomerlys Ponce R
 
YAMAHA XS 400 - PAPELEO - Yamaha color codes
YAMAHA XS 400 - PAPELEO - Yamaha color codesYAMAHA XS 400 - PAPELEO - Yamaha color codes
YAMAHA XS 400 - PAPELEO - Yamaha color codesYamahaXS400
 
Case Study: Nylon Syringe Filter E&L
Case Study: Nylon Syringe Filter E&LCase Study: Nylon Syringe Filter E&L
Case Study: Nylon Syringe Filter E&LJordi Labs
 
Mapa conceptual - EDWIN GUEVARA
Mapa conceptual - EDWIN GUEVARAMapa conceptual - EDWIN GUEVARA
Mapa conceptual - EDWIN GUEVARAUDES
 
Brochure exportación 2006 - 2012
Brochure exportación 2006 - 2012Brochure exportación 2006 - 2012
Brochure exportación 2006 - 2012ProColombia
 
Solar irradiance data sources & software
Solar irradiance data sources & softwareSolar irradiance data sources & software
Solar irradiance data sources & softwareakhtar ali
 
Caso de Exito EUROPAC & STRATESYS - Cuentas Anuales Consolidadas bajo IFRS - ...
Caso de Exito EUROPAC & STRATESYS - Cuentas Anuales Consolidadas bajo IFRS - ...Caso de Exito EUROPAC & STRATESYS - Cuentas Anuales Consolidadas bajo IFRS - ...
Caso de Exito EUROPAC & STRATESYS - Cuentas Anuales Consolidadas bajo IFRS - ...Lorena Martin
 

En vedette (17)

Las Doctrinas Éticas
Las Doctrinas ÉticasLas Doctrinas Éticas
Las Doctrinas Éticas
 
Movilidad laboral en la UE: una verdad incómoda
Movilidad laboral en la UE: una verdad incómodaMovilidad laboral en la UE: una verdad incómoda
Movilidad laboral en la UE: una verdad incómoda
 
Areniscas Stone EN G
Areniscas Stone EN GAreniscas Stone EN G
Areniscas Stone EN G
 
Propuestas de regulación de los medios de comunicación: Ley de Radiodifusión ...
Propuestas de regulación de los medios de comunicación: Ley de Radiodifusión ...Propuestas de regulación de los medios de comunicación: Ley de Radiodifusión ...
Propuestas de regulación de los medios de comunicación: Ley de Radiodifusión ...
 
lync
 lync lync
lync
 
Traball sobre palma grup 3.
Traball sobre palma  grup 3.Traball sobre palma  grup 3.
Traball sobre palma grup 3.
 
Intranet pecha kucha sydney
Intranet pecha kucha sydneyIntranet pecha kucha sydney
Intranet pecha kucha sydney
 
Grupo de Telecomunicaciones Rurales PUCP
Grupo de Telecomunicaciones Rurales PUCPGrupo de Telecomunicaciones Rurales PUCP
Grupo de Telecomunicaciones Rurales PUCP
 
LAUREN SHORT IDVL PROJECT
LAUREN SHORT IDVL PROJECTLAUREN SHORT IDVL PROJECT
LAUREN SHORT IDVL PROJECT
 
Diapositivas final electiva vi
Diapositivas final electiva viDiapositivas final electiva vi
Diapositivas final electiva vi
 
Sales Planner
Sales PlannerSales Planner
Sales Planner
 
YAMAHA XS 400 - PAPELEO - Yamaha color codes
YAMAHA XS 400 - PAPELEO - Yamaha color codesYAMAHA XS 400 - PAPELEO - Yamaha color codes
YAMAHA XS 400 - PAPELEO - Yamaha color codes
 
Case Study: Nylon Syringe Filter E&L
Case Study: Nylon Syringe Filter E&LCase Study: Nylon Syringe Filter E&L
Case Study: Nylon Syringe Filter E&L
 
Mapa conceptual - EDWIN GUEVARA
Mapa conceptual - EDWIN GUEVARAMapa conceptual - EDWIN GUEVARA
Mapa conceptual - EDWIN GUEVARA
 
Brochure exportación 2006 - 2012
Brochure exportación 2006 - 2012Brochure exportación 2006 - 2012
Brochure exportación 2006 - 2012
 
Solar irradiance data sources & software
Solar irradiance data sources & softwareSolar irradiance data sources & software
Solar irradiance data sources & software
 
Caso de Exito EUROPAC & STRATESYS - Cuentas Anuales Consolidadas bajo IFRS - ...
Caso de Exito EUROPAC & STRATESYS - Cuentas Anuales Consolidadas bajo IFRS - ...Caso de Exito EUROPAC & STRATESYS - Cuentas Anuales Consolidadas bajo IFRS - ...
Caso de Exito EUROPAC & STRATESYS - Cuentas Anuales Consolidadas bajo IFRS - ...
 

Plus de kantanmt

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskaskantanmt
 
Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2kantanmt
 
Kantanfest: Laura Casanellas
Kantanfest: Laura CasanellasKantanfest: Laura Casanellas
Kantanfest: Laura Casanellaskantanmt
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1kantanmt
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowdkantanmt
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeuralkantanmt
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answerkantanmt
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systemskantanmt
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translationkantanmt
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...kantanmt
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016kantanmt
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translationkantanmt
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translationkantanmt
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...kantanmt
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivitykantanmt
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up businesskantanmt
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?kantanmt
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translationkantanmt
 

Plus de kantanmt (20)

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskas
 
Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2
 
Kantanfest: Laura Casanellas
Kantanfest: Laura CasanellasKantanfest: Laura Casanellas
Kantanfest: Laura Casanellas
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowd
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeural
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answer
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translation
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translation
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translation
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up business
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translation
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Improve your KantanMT Machine Translation Engine

  • 1. No Hardware. No Software. No Hassle MT.
  • 2. With KantanMT 3 Steps to Improve Quality
  • 3. What we aim to cover today? How to improve the quality of your MT Engine?  A Build – Measure – Learn process  How do we measure and quantify Quality in MT?  Practical illustrations throughout of KantanMT in action  Questions & Answers  3 Steps to Improve Quality
  • 4. 3-Steps to Higher Quality • Evolutionary Process Not a once off step • Continuous improvement loop • Incremental Improvements over time • • GIGO • • Build Build Build Kantan MT Engine Garbage In => Garbage Out Production Engine At least three iterations • Experimentation with different inputs • Measurements of different outputs • Control your own destiny • Learn Measure 3 Steps to Improve Quality
  • 5. Build – Building quality training streams • Training Data • • • Bad Training Data • • How KantanMT learns to translate Mimic your style, terminology, fluency Garbage In => Garbage Out Three main factors:    Quality Relevance to domain Quantity 3 Steps to Improve Quality
  • 6. Build – Building quality training streams  Training Data - Three main factors: Quality  The linguistic quality of the training material is crucially important Relevance to domain   A high quality MT system has good domain knowledge Similar to the way you’ve always worked with Translation Memories and CAT tools Quantity  The more training data you use to build your engine the better its capacity to generate translations that mimic your translation style and terminology 3 Steps to Improve Quality
  • 7. Build – Building quality training streams Balancing the equation Quality • 3 Steps to Improve Quality
  • 8. Build – is Quantity important? Not if quality is good – it’s a balancing act! Quality  3 Steps to Improve Quality
  • 9. Build – Building quality training streams  Quality Training Data - Suitable Sources * KantanMT Stock Engines Language Base Data * Translation Memories Translation sources * Other Translation Memories Domain Base Data Monolingual Data Training Data * Monolingual (target only) data 3 Steps to Improve Quality
  • 10. Build – Building quality training streams  Advantages of clean, high-quality training data Less correction of errors  Finding cause of errors is easier  Easy to fill gaps  Faster processing time  Large volume of dirty make correction difficult  Finding root cause of problem challenging  Slower training and processing time  3 Steps to Improve Quality
  • 11. Build – in KantanMT 3 Steps to Improve Quality
  • 12. Build – Measure - Learn Build Kantan MT Engine Learn Measure Measure 3 Steps to Improve Quality
  • 13. Measure – KantanMT engine calibration  What to Measure? BLEU F-Measure Word-counts TER 3 Steps to Improve Quality
  • 14. Measure – KantanMT engine calibration  BLEU Score      Scoring system developed to automate this process of evaluation Internationally recognised and most widely used measure of the quality of your MT engine The BLEU metric scores a translation on a scale of 0 to 100% The closer to 100%, the more the translation correlates to a human translation AIM: HIGH 3 Steps to Improve Quality
  • 15. Measure – KantanMT engine calibration  F-Measure Score      F-Measure is an automated measurement to determine the precision and recall capabilities A general guide to determine the overall quality performance of an engine Ratio between recall and precision measurements Displayed as a percentage value on a scale of 0 to 100% AIM: HIGH 3 Steps to Improve Quality
  • 16. Measure – KantanMT engine calibration  TER Score A method to help in predict the post-editing effort  TER is quick to use and correlates highly with actual post-editing effort  A TER score is a value in the range of 0 to 100%  AIM: LOW  3 Steps to Improve Quality
  • 17. Measure – KantanMT engine calibration  Word-counts    At least 1.5-2.0 million words to build a predictable, quality KantanMT engine Less than 2m words - then the engine has to be used only in a narrow field-domain Wide field-domain engine – then you would need in the order of 10-15m words of training data 3 Steps to Improve Quality
  • 18. Measure – KantanMT engine calibration  Track your scores using KantanWatch™ 3 Steps to Improve Quality
  • 19. Measure – KantanMT engine calibration  Compare Engine Scores & Performance 3 Steps to Improve Quality
  • 20. Build – Measure – Learn Build Kantan MT Engine Learn Learn Measure Measure 3 Steps to Improve Quality
  • 21. Learn – KantanMT Experimentation 3 Steps to Improve Quality
  • 22. Learn – KantanMT Experimentation 3 Steps to Improve Quality
  • 23. Learn – KantanMT Experimentation 3 Steps to Improve Quality
  • 24. Learn – KantanMT Experimentation  Running and learning from your first translation job BLEU 24% F-Measure 50% TER 66% Wordcount 172K 3 Steps to Improve Quality
  • 25. Learn – KantanMT Experimentation  Learn from examining the output Low  High Low Catalog Errors      OK Untranslated text Incorrect numeric formatting Invalid characters High level of post-editing required Conclusions     Engine coverage is bad due to low wordcount Post-Editing is high due to low engine coverage Training data doesn’t contain correct numeric formatting Bad formatting in training data 3 Steps to Improve Quality
  • 26. Learn – KantanMT Experimentation  Learn from examining the output Low  OK High Low Action Plan     Coverage – More training data required, relevant and of high quality. Also use a Glossary File to improve terminology consistency and accuracy. Numeric Formatting – Use PEX rule to post-edit translation and fix numeric formats Invalid Character – Use PEX rule to fix this invalid character issue Post-Editing – By increasing the quantity of training data the KantanMT engine will perform better overall 3 Steps to Improve Quality
  • 27. Build – Action Plan  Action Plan   Coverage – More training data required, relevant and of high quality Post-Editing – By increasing the quantity of training data the KantanMT engine will perform better overall 3 Steps to Improve Quality
  • 28. Measure – Action Plan  Your latest scores are… 3 Steps to Improve Quality
  • 29. Measure – Action Plan  Results using more relevant, high quality Training Data BLEU F-Measure 64% Excellent  TER 63% 33% Very Good Very Good Wordcount 479K Good Previously… Low OK High Low 3 Steps to Improve Quality
  • 30. Learn/Build – Action Plan  Customise your Engine – Runtime customisation improves Quality too! PEX Rules Kantan MT Engine Higher Quality Machine Translation Reduced PostEditing TBX Files 3 Steps to Improve Quality
  • 31. Learn/Build – Action Plan  Action Plan    Coverage – Use a Glossary File to improve terminology consistency and accuracy Numeric Formatting – Use PEX rule to post-edit translation and fix numeric formats Invalid Character – Use PEX rule to fix this invalid character issue 3 Steps to Improve Quality
  • 32. Learn/Build – Action Plan PEX file Original output  Action Plan    Coverage – Use a Glossary File to improve terminology consistency and accuracy Numeric Formatting – Use PEX rule to post-edit translation and fix numeric formats Invalid Character – Use PEX rule to fix this invalid character issue 3 Steps to Improve Quality
  • 33. Build – Measure – Learn – The Results  Analyse output Untranslated text  Numeric Formatting  Invalid Character  IMPROVED QUALITY 3 Steps to Improve Quality
  • 34. Build – Measure – Learn Build Kantan MT Engine Learn Learn  Human Post-Editing as part of the Learn step  Take the KantanMT output  Post-Edit it by a Linguist  Re-build the KantanMT Engine  Rapidly improves Quality of your KantanMT Engine Measure 3 Steps to Improve Quality
  • 35. Learn – Action Plan  Post-Editing Feedback - Rapidly improves your KantanMT Engine. Kantan MT Engine XLIFF, TMX Machine Translation Post-Editing Kantan MT Engine Higher Quality Rebuild KantanMT Engine Finalised Publishable XLIFF, TMX 3 Steps to Improve Quality
  • 36. Summary – Build-Measure-Learn 3 Steps to Improve Quality
  • 37. Summary – Build-Measure-Learn 3 Steps to Improve Quality
  • 38. Summary – Build-Measure-Learn Who can do this? YOU CAN! 3 Steps to Improve Quality
  • 39. Summary – Build-Measure-Learn  You as a LSP or Language Professional provide: Extensive Language expertise  Skills to ensure accuracy and precision of your translation  Management / maintenance of TM’s for your clients for use in your CAT tools   KantanMT provides:  Software and the Hardware to Build your engines  Quality metrics to Measure the quality of your engine Tools and Process to Learn and then teach your engine  Support and Help  3 Steps to Improve Quality
  • 40. Summary – Build-Measure-Learn Follow this Build – Measure – Learn process  KantanMT will increase Productivity  Process more words per hour per day  Net result?  Higher Earnings  More Income  Better Margins  3 Steps to Improve Quality
  • 41. Questions & Answers Thank you! 3 Steps to Improve Quality
  • 42. Additional information For additional information please visit: http://www.kantanmt.com Contact me at: Kevin McCoy E-mail: kevinmcc@kantanmt.com Mobile: +353 86 823 1527 3 Steps to Improve Quality