SlideShare a Scribd company logo
1 of 34
No Hardware. No Software. No Hassle MT.
KantanMT Analytics - The Missing Link
What we aim to cover today?
 The MT & Quality Relationship
 What is quality?
 Possible ways of measuring it
 Automated/Manual methods
 Who needs to measure quality
 Localisation stakeholders
 The Missing Link - KantanMT Analytics



Segment level quality analysis
Helping to build predictable business models

45 Mins Presentation
15 Mins Q&A

 Q&A
KantanMT Analytics - The Missing Link
What is KantanMT.com?
 Statistical MT System
 Cloud-based




Highly scalable
Inexpensive to operate
Quick to deploy

 Our Vision
 To put Machine Translation




Customization
Improvement
Deployment

 into your hands

Fully Operational 7 months
Active KantanMT Engines

6,632
Training Words Uploaded

23,653,605,925
Member Words Translated

362,291,925
KantanMT Analytics - The Missing Link
The Quality & MT Relationship
 Let’s agree a model for defining quality!

Quality Target (defined by client)

No Quality (baseline)



Taking into consideration quality of MT outputs and level of quality defined by your clients.

KantanMT Analytics - The Missing Link
Attributes of Quality
Attributes of Quality – Model
Language Attributes
 Adequacy




Fluency
Adequacy

Meaning of generated texts

expressed in source/target

 Fluency



Comprehensibility & readability
Factors include




Task-oriented Attributes
 Productivity


Post-editing speed

 Acceptability



Fit-for-purpose measurement
Usable translations within the
context of the end user/client

Acceptability

Grammar errors
word selection
syntax

Language

Productivity

Task

KantanMT Analytics - The Missing Link
Attributes of Quality
Attributes of Quality – Model
Language Attributes
 Adequacy




Fluency
Adequacy

Meaning of generated texts

expressed in source/target

 Fluency



Comprehensibility & readability
Factors include




Task-oriented Attributes
 Productivity


Post-editing speed

 Acceptability



Fit-for-purpose measurement
Usable translations within the
context of the end user/client

Acceptability

Grammar errors
word selection
syntax

Language
Translation Style

Productivity

Task
Business Model
KantanMT Analytics - The Missing Link
Attributes of Quality
Attributes of Quality – Model
Language Attributes

Task-oriented Attributes

What we want?

Fluency
Adequacy

Productivity
Acceptability

FuzzyMatch

Language
Translation Style

Task
Business Model
KantanMT Analytics - The Missing Link
Measuring MT Quality
 Automated
 Fast
 Repeatable
 Objective
 Scalable
 Cheap
 Based on samples
 Can’t be used by PMs


Scope/Cost predictions

 Manual
 Slow
 Cumbersome
 Subjective
 Not scalable
 Expensive
 Based on samples
 Can’t be used by PMs


Scope/Cost predictions

KantanMT Analytics - The Missing Link
Measuring MT by hand!
 Sample Translations based on template
Style

Wrong terminology
Wrong Spelling
Source not Capitalization
Translated/Omissions
Syntax & Grammar
Compliance with client specs
Wrong Word Form
Literal translation Part of Speech
Wrong
Text/Information added
Punctuation
Technical
Tags and Markup
Sentence Structure
Locale Adaptation

Overall

Spacing Adequacy Score
Fluency Score
Overall Quality Score

KantanMT Analytics - The Missing Link
Manual Framework
 Adequacy Score (Range 1 – 5)

5

 Full Meaning


All meaning expressed in the source segment appears in the translated
segment

 Most Meaning


Most of the source segment meaning is expressed in the translated segment

 Much Meaning


Much of the source segment meaning is expressed in the translated segment

 Little Meaning


Little of the source segment is expressed in the translated segment

 No Meaning


None of the meaning expressed in the source segment is expressed in the
translated segment

1

KantanMT Analytics - The Missing Link
Manual Framework
 Fluency Score (Range 1 – 5)

5

 Native language fluency


No grammar errors, excellent word selection and good syntax. No post-editing
required.

 Near native fluency


Few terminology/grammar errors. No impact on overall understanding of the
meaning. Little post-editing required.

 Not very fluent


About half of translation contains errors and requires post-editing.

 Little fluency


Wrong word choice, poor grammar and syntax. A lot of post-editing required.

 No fluency


Absolutely ungrammatical and doesn’t make any sense. Re-translate from
scratch .

1

KantanMT Analytics - The Missing Link
Source
MT Target
Spacing

Syntax and Grammar

Locale Adaptation

Tags and Markup

Sentence Structure

Punctuation

Wrong Part of Speech

Style

Wrong Word Form

Capitalization

Text/Information added

Literal translation

Compliance with client specs

Source not
Translated/Omissions

Wrong Spelling

Wrong terminology

Overall quality (1-4)

Fluency (Score 1-5)

Adequacy (Score 1-5)

Manual Framework
Tech

KantanMT Analytics - The Missing Link
Manual Framework
Attributes of Quality – Model
Language Attributes

Fluency

Task-oriented Attributes

Productivity

Manual
Methods

Adequacy

Acceptability

Language
Translation Style

Task
Business Model
KantanMT Analytics - The Missing Link
Automated Methods
 Many different methods available
 BLEU, F-Measure, GTM, TER, NIST, Meteor, etc.
 Common characteristics
 Compute similarity of generated texts to reference texts
 The smaller the difference => the better the quality!
 Broad adoption
 Industry & Academia

KantanMT Analytics - The Missing Link
Automated Methods
 F-Measure
 Recall & Precision Metric
Reference Translation
MT Output
Recall

Precision

F-Measure

correct
Ref-Len

correct
MT-Len

Precision * Recall
(Precision + Recall) /2

80%

66%

73%

 Flaw: no penalty for reordering
KantanMT Analytics - The Missing Link
Automated Methods
 WER (Word Error Rate)
 Min number of edits to transform output to reference
Reference Translation
MT Output
WER
Substitutions + insertions + deletions
Reference-length




Levenshtein distance measure
General indicator of Post-Editing Effort
KantanMT Analytics - The Missing Link
Automated Methods
 BLEU Score
 Put simply – measures how many words overlap, giving
higher scores to sequential words
 High correlation between BLEU and human judgement of
translation quality
Reference Translation

MT Output

KantanMT Analytics - The Missing Link
Automated Methods
 KantanWatch™ can be used to track and monitor

automated scores

* KantanWatch Reports

KantanMT Analytics - The Missing Link
Automated Methods
 Improvements can be monitored during the build-

measure-learn cycle of a KantanMT deployment

* KantanWatch Reports

KantanMT Analytics - The Missing Link
Automated Methods
 Time-graphs offer good overview of the maturing of a

KantanMT engine

* KantanWatch Reports

KantanMT Analytics - The Missing Link
Automated Methods
 Can also present a holistic view of the potential quality

of KantanMT outputs

* KantanWatch Reports

KantanMT Analytics - The Missing Link
Automated Methods
Attributes of Quality – Model
Language Attributes

Task-oriented Attributes

NIST

Fluency

Productivity

GTM
F-Measure

Adequacy

TER

Acceptability

BLEU
METEOR

Language

Task

Translation Style
Business Model
Major Flaw: All measurements based on reference translations
KantanMT Analytics - The Missing Link
Who uses these measurements?
 The Localisation Stakeholder Dilemma
 Developers of MT Engines




Automated BLEU, METEOR, F-MEASURE, TER ideal and practical
No individual measurement has absolute meaning


but points quality curve in the right direction within a domain

KantanMT Analytics - The Missing Link
Who needs to measure Quality?
 The Localisation Stakeholder Dilemma
 Production Teams (PMs, LEs and QEs)


Need segment measurements on quality and PE efforts



Determine tiered segment post-edit rate
Distribution of post-editing tasks based on segment quality

 Localisation Managers


Need productivity measurements to predict budget and schedule



Aka Project Segment Reports
MT Measurements need to ‘fit’ business planning and charge models

 Translators


Unfortunately, don’t get a fair deal


No segment information, just top level project ‘inferences’ based on samples
KantanMT Analytics - The Missing Link
Manual
Methods

TER

BLEU

GTM

METEOR

F-Measure

NIST

MT Developers

Production

The Quality & MT Relationship

KantanMT Analytics - The Missing Link
Conclusions
 There are many automated MT quality measurements




Mostly suitable for MT developers
Not optimal for production teams
Of no use to translators

 All rely on reference texts to compute measurements

 What’s needed?
 Segment level measurements



Drive project schedule and charge model
High correlation to human effort

 Do not rely on reference texts to compute measurements

KantanMT Analytics - The Missing Link
Attributes of Quality
Attributes of Quality – Model
Language Attributes

Task-oriented Attributes

What you want…

Fluency
Adequacy

Productivity
Acceptability

KantanMT Analytics

Language
Translation Style

Task
Business Model
KantanMT Analytics - The Missing Link
Introducing KantanMT Analytics™
 Segment level scoring for MT output
 Designed to make it possible to create predictable
 Business Models
 Project Schedule
 Cost Models
 Co-developed
 KantanMT.com
 CNGL – Centre of Next Generation Localisation

KantanMT Analytics - The Missing Link
KantanMT Analytics™
 Select Analyse feature

KantanMT Analytics - The Missing Link
KantanMT Analytics™
 Select Analyse feature

KantanMT Analytics - The Missing Link
KantanMT Analytics™
 KantanMT Analytics Report

created

 XML based for consumption by

TMS/GMS platforms
KantanMT Analytics - The Missing Link
KantanMT Analytics™
 XLIFF document created

 Contains scores for each segment

KantanMT Analytics - The Missing Link
The Missing Link
Attributes of Quality – Model
Language Attributes

Task-oriented Attributes

Fluency

Productivity

KantanMT Analytics™

Adequacy

Language
Translation Style

Acceptability

Task
Business Model
KantanMT Analytics - The Missing Link

More Related Content

More from kantanmt

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskaskantanmt
 
Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2kantanmt
 
Kantanfest: Laura Casanellas
Kantanfest: Laura CasanellasKantanfest: Laura Casanellas
Kantanfest: Laura Casanellaskantanmt
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1kantanmt
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowdkantanmt
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeuralkantanmt
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answerkantanmt
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systemskantanmt
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translationkantanmt
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...kantanmt
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016kantanmt
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translationkantanmt
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translationkantanmt
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...kantanmt
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivitykantanmt
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up businesskantanmt
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?kantanmt
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translationkantanmt
 

More from kantanmt (20)

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskas
 
Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2
 
Kantanfest: Laura Casanellas
Kantanfest: Laura CasanellasKantanfest: Laura Casanellas
Kantanfest: Laura Casanellas
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowd
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeural
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answer
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translation
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translation
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translation
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up business
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translation
 

Recently uploaded

State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 

Recently uploaded (20)

State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 

KantanMT Analytics: The Missing Link in Machine Translation

  • 1. No Hardware. No Software. No Hassle MT.
  • 2. KantanMT Analytics - The Missing Link
  • 3. What we aim to cover today?  The MT & Quality Relationship  What is quality?  Possible ways of measuring it  Automated/Manual methods  Who needs to measure quality  Localisation stakeholders  The Missing Link - KantanMT Analytics   Segment level quality analysis Helping to build predictable business models 45 Mins Presentation 15 Mins Q&A  Q&A KantanMT Analytics - The Missing Link
  • 4. What is KantanMT.com?  Statistical MT System  Cloud-based    Highly scalable Inexpensive to operate Quick to deploy  Our Vision  To put Machine Translation    Customization Improvement Deployment  into your hands Fully Operational 7 months Active KantanMT Engines 6,632 Training Words Uploaded 23,653,605,925 Member Words Translated 362,291,925 KantanMT Analytics - The Missing Link
  • 5. The Quality & MT Relationship  Let’s agree a model for defining quality! Quality Target (defined by client) No Quality (baseline)  Taking into consideration quality of MT outputs and level of quality defined by your clients. KantanMT Analytics - The Missing Link
  • 6. Attributes of Quality Attributes of Quality – Model Language Attributes  Adequacy   Fluency Adequacy Meaning of generated texts expressed in source/target  Fluency   Comprehensibility & readability Factors include    Task-oriented Attributes  Productivity  Post-editing speed  Acceptability   Fit-for-purpose measurement Usable translations within the context of the end user/client Acceptability Grammar errors word selection syntax Language Productivity Task KantanMT Analytics - The Missing Link
  • 7. Attributes of Quality Attributes of Quality – Model Language Attributes  Adequacy   Fluency Adequacy Meaning of generated texts expressed in source/target  Fluency   Comprehensibility & readability Factors include    Task-oriented Attributes  Productivity  Post-editing speed  Acceptability   Fit-for-purpose measurement Usable translations within the context of the end user/client Acceptability Grammar errors word selection syntax Language Translation Style Productivity Task Business Model KantanMT Analytics - The Missing Link
  • 8. Attributes of Quality Attributes of Quality – Model Language Attributes Task-oriented Attributes What we want? Fluency Adequacy Productivity Acceptability FuzzyMatch Language Translation Style Task Business Model KantanMT Analytics - The Missing Link
  • 9. Measuring MT Quality  Automated  Fast  Repeatable  Objective  Scalable  Cheap  Based on samples  Can’t be used by PMs  Scope/Cost predictions  Manual  Slow  Cumbersome  Subjective  Not scalable  Expensive  Based on samples  Can’t be used by PMs  Scope/Cost predictions KantanMT Analytics - The Missing Link
  • 10. Measuring MT by hand!  Sample Translations based on template Style Wrong terminology Wrong Spelling Source not Capitalization Translated/Omissions Syntax & Grammar Compliance with client specs Wrong Word Form Literal translation Part of Speech Wrong Text/Information added Punctuation Technical Tags and Markup Sentence Structure Locale Adaptation Overall Spacing Adequacy Score Fluency Score Overall Quality Score KantanMT Analytics - The Missing Link
  • 11. Manual Framework  Adequacy Score (Range 1 – 5) 5  Full Meaning  All meaning expressed in the source segment appears in the translated segment  Most Meaning  Most of the source segment meaning is expressed in the translated segment  Much Meaning  Much of the source segment meaning is expressed in the translated segment  Little Meaning  Little of the source segment is expressed in the translated segment  No Meaning  None of the meaning expressed in the source segment is expressed in the translated segment 1 KantanMT Analytics - The Missing Link
  • 12. Manual Framework  Fluency Score (Range 1 – 5) 5  Native language fluency  No grammar errors, excellent word selection and good syntax. No post-editing required.  Near native fluency  Few terminology/grammar errors. No impact on overall understanding of the meaning. Little post-editing required.  Not very fluent  About half of translation contains errors and requires post-editing.  Little fluency  Wrong word choice, poor grammar and syntax. A lot of post-editing required.  No fluency  Absolutely ungrammatical and doesn’t make any sense. Re-translate from scratch . 1 KantanMT Analytics - The Missing Link
  • 13. Source MT Target Spacing Syntax and Grammar Locale Adaptation Tags and Markup Sentence Structure Punctuation Wrong Part of Speech Style Wrong Word Form Capitalization Text/Information added Literal translation Compliance with client specs Source not Translated/Omissions Wrong Spelling Wrong terminology Overall quality (1-4) Fluency (Score 1-5) Adequacy (Score 1-5) Manual Framework Tech KantanMT Analytics - The Missing Link
  • 14. Manual Framework Attributes of Quality – Model Language Attributes Fluency Task-oriented Attributes Productivity Manual Methods Adequacy Acceptability Language Translation Style Task Business Model KantanMT Analytics - The Missing Link
  • 15. Automated Methods  Many different methods available  BLEU, F-Measure, GTM, TER, NIST, Meteor, etc.  Common characteristics  Compute similarity of generated texts to reference texts  The smaller the difference => the better the quality!  Broad adoption  Industry & Academia KantanMT Analytics - The Missing Link
  • 16. Automated Methods  F-Measure  Recall & Precision Metric Reference Translation MT Output Recall Precision F-Measure correct Ref-Len correct MT-Len Precision * Recall (Precision + Recall) /2 80% 66% 73%  Flaw: no penalty for reordering KantanMT Analytics - The Missing Link
  • 17. Automated Methods  WER (Word Error Rate)  Min number of edits to transform output to reference Reference Translation MT Output WER Substitutions + insertions + deletions Reference-length   Levenshtein distance measure General indicator of Post-Editing Effort KantanMT Analytics - The Missing Link
  • 18. Automated Methods  BLEU Score  Put simply – measures how many words overlap, giving higher scores to sequential words  High correlation between BLEU and human judgement of translation quality Reference Translation MT Output KantanMT Analytics - The Missing Link
  • 19. Automated Methods  KantanWatch™ can be used to track and monitor automated scores * KantanWatch Reports KantanMT Analytics - The Missing Link
  • 20. Automated Methods  Improvements can be monitored during the build- measure-learn cycle of a KantanMT deployment * KantanWatch Reports KantanMT Analytics - The Missing Link
  • 21. Automated Methods  Time-graphs offer good overview of the maturing of a KantanMT engine * KantanWatch Reports KantanMT Analytics - The Missing Link
  • 22. Automated Methods  Can also present a holistic view of the potential quality of KantanMT outputs * KantanWatch Reports KantanMT Analytics - The Missing Link
  • 23. Automated Methods Attributes of Quality – Model Language Attributes Task-oriented Attributes NIST Fluency Productivity GTM F-Measure Adequacy TER Acceptability BLEU METEOR Language Task Translation Style Business Model Major Flaw: All measurements based on reference translations KantanMT Analytics - The Missing Link
  • 24. Who uses these measurements?  The Localisation Stakeholder Dilemma  Developers of MT Engines   Automated BLEU, METEOR, F-MEASURE, TER ideal and practical No individual measurement has absolute meaning  but points quality curve in the right direction within a domain KantanMT Analytics - The Missing Link
  • 25. Who needs to measure Quality?  The Localisation Stakeholder Dilemma  Production Teams (PMs, LEs and QEs)  Need segment measurements on quality and PE efforts   Determine tiered segment post-edit rate Distribution of post-editing tasks based on segment quality  Localisation Managers  Need productivity measurements to predict budget and schedule   Aka Project Segment Reports MT Measurements need to ‘fit’ business planning and charge models  Translators  Unfortunately, don’t get a fair deal  No segment information, just top level project ‘inferences’ based on samples KantanMT Analytics - The Missing Link
  • 26. Manual Methods TER BLEU GTM METEOR F-Measure NIST MT Developers Production The Quality & MT Relationship KantanMT Analytics - The Missing Link
  • 27. Conclusions  There are many automated MT quality measurements    Mostly suitable for MT developers Not optimal for production teams Of no use to translators  All rely on reference texts to compute measurements  What’s needed?  Segment level measurements   Drive project schedule and charge model High correlation to human effort  Do not rely on reference texts to compute measurements KantanMT Analytics - The Missing Link
  • 28. Attributes of Quality Attributes of Quality – Model Language Attributes Task-oriented Attributes What you want… Fluency Adequacy Productivity Acceptability KantanMT Analytics Language Translation Style Task Business Model KantanMT Analytics - The Missing Link
  • 29. Introducing KantanMT Analytics™  Segment level scoring for MT output  Designed to make it possible to create predictable  Business Models  Project Schedule  Cost Models  Co-developed  KantanMT.com  CNGL – Centre of Next Generation Localisation KantanMT Analytics - The Missing Link
  • 30. KantanMT Analytics™  Select Analyse feature KantanMT Analytics - The Missing Link
  • 31. KantanMT Analytics™  Select Analyse feature KantanMT Analytics - The Missing Link
  • 32. KantanMT Analytics™  KantanMT Analytics Report created  XML based for consumption by TMS/GMS platforms KantanMT Analytics - The Missing Link
  • 33. KantanMT Analytics™  XLIFF document created  Contains scores for each segment KantanMT Analytics - The Missing Link
  • 34. The Missing Link Attributes of Quality – Model Language Attributes Task-oriented Attributes Fluency Productivity KantanMT Analytics™ Adequacy Language Translation Style Acceptability Task Business Model KantanMT Analytics - The Missing Link

Editor's Notes

  1. No more expensive deploymentsMonthly subscription plan Customised subscription planNo more complexityKantanMT does all the heavy liftingYou focus on what you do best – grow and develop your business
  2. Flaw – no penalty for reordering
  3. Flaw – no penalty for reordering
  4. Flaw – no penalty for reordering