SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
‘Machine Translation 101’
The Latest Advances in Translation Technology
for Patent Information
Dr. John Tinsley
CEO, Iconic Translation Machines Ltd.
EPOPIC. Copenhagen. 11th November 2015
The need for translation
50% of all PCT applications in 2013 came from Asia
  BSc in Computational Linguistics
  PhD in Machine Translation
  Language Technology consultant
  Founder of Iconic Translation Machines
Why listen to me?
Machine Translation is what I do!
The world’s first and only patent specific machine translation platform
IPTranslatorPatent Translation by Iconic Translation Machines
§  The use of computers to translate from one language into another
§  The use of computers to automate some, or all, of the translation
process
§  An approach to Machine Translation, where translations for an input are
estimated based on previous seen translation examples and associated
(inferred) probabilities.
§  e.g. IPTranslator, Google Translate
§  Rule-based (or transfer-based): based on linguistic rules
•  e.g. Systran; Altavista’s Babelfish
§  Example-based: based on translation examples and inferred linguistic
patterns
Machine Translation: The Basics
Machine Translation = automatic translation
Statistical Machine Translation (SMT)
Other approaches
SMT is now by far the predominant approach
A corpus (pl. corpora) is a collection
of texts, in electronic format, in a
single language
§  document(s)
§  book(s)
Bilingual Corpora
a bilingual corpus
  Note source language = original language or language we’re translating from
target language = language we’re translating into
A bilingual corpus is a collection of
corresponding texts, in multiple
languages
§  a document & its translation
§  a book in multiple languages
§  European Parliament proceedings
Aligned Bilingual Corpora
A document-aligned bilingual corpus corresponds on a document
level
For translation, we required sentence-aligned bilingual corpora
§  The sentence on line 1 in the source language text corresponds
to (i.e. is a translation of) the sentence on line 1 in the target
language text etc.
§  Often referred to as parallel aligned corpora
Sentence aligned bilingual parallel corpora
are essential for statistical machine translation
Learning from Previous Translations
Suppose we already know
(from a sentence-aligned bilingual
corpus) that:
§  “dog” is translated as “perro”
§  “I have a cat” is translated as
“Tengo un gato”
We can theoretically translate:
§  “I have a dog” à “Tengo un perro”
§  Even though we have never seen “I
have a dog” before
Statistical machine translation induces information about unseen input, based on
previously known translations:
§  Primarily co-occurrence statistics
§  Takes contextual information into account
Statistical Machine Translation
§  Example of a small sentence-aligned
bilingual corpus for English-French
Statistical Machine Translation
§  We take some new sentence to translate
Statistical Machine Translation
§  From the corpus we can infer possible target (French)
translations for various source (English) words
§  We can then select the most probable translations
based on simple frequencies (co-occurrence statistics)
Statistical Machine Translation
Given a previously unseen input sentence, and our collated statistics,
we can estimate translation
Advanced MT
All modern approaches are based on building translations for complete
sentences by putting together smaller pieces of translation
Previous example is very simplistic
§  In reality SMT systems calculate much more complex statistical models
over millions of sentence pairs for a pair of languages
§  Upwards of 2M sentence pairs on average for large-scale systems
§  Word-to-word translation probabilities
§  Phrase-to-phrase translation probabilities
§  Word order probabilities
§  Linguistic information (are the words nouns, verbs?)
§  Fluency of the final output
	
  
Previous example is very simplistic
Other statistics calculated include
Data is Key
For SMT data is key
§  Information (word/phrase correspondences and associated statistics) is only based
on what we have seen before in the data
Important that data used to train SMT systems is:
§  Of sufficient size
§  avoid sparseness/skewed statistics
§  Representative and relevant
§  contains the right type of language
§  High-quality
§  absence of misspellings,
incorrect alignments etc.
§  Proofed by human
translators
training data
Why is MT Difficult?
A word or a phrase can have more than one meaning (ambiguity – lexical or
structural)
§  e.g. “bank”, “dive”, “I saw the man with the telescope”
People use language creatively
§  New words are cropping up all the time
Linguistic differences between languages
§  e.g. structure of Irish sentences vs. structure of English sentences:
§  “Tá (Is) ocras (hunger) orm (on me)” <-> “I am hungry”
There can be more than one way to express the same meaning.
§  “New York”, “The Big Apple”, “NYC”
Why is MT Difficult?
§  Israeli officials are responsible for airport security.
§  Israel is in charge of the security at this airport.
§  The security work for this airport is the responsibility of the Israel government.
§  Israeli side was in charge of the security of this airport.
§  Israel is responsible for the airport’s security.
§  Israel is responsible for safety work at this airport.
§  Israel presides over the security of the airport.
§  Israel took charge of the airport security.
§  The safety of this airport is taken charge of by Israel.
§  This airport’s security is the responsibility of the Israeli security officials.
No single solution for all languages
Number agreement: the house / the houses vs. la maison / les maisons
Gender agreement: the house / the cheese vs. la maison / le frommage
English - Spanish
English - French
No single solution for all languages
English - German
English - Chinese
种水果的农民
The farmer who grows fruit
[Lit: “grow fruit (particle) farmer”]
The Challenge of Patents
L is an organic group selected from -CH2-
(OCH2CH2)n-, -CO-NR'-, with R'=H or
C1-C4 alkyl group; n=0-8; Y=F, CF3 …
maximum stress of 1.2 to 3.5 N/mm<2>
and a maximum elongation of 700 to
1,300% at 0[deg.] C.
Long Sentences
Technical constructions
Largest single document: 249,322 words
Longest Sentence: 1,417 words
The Challenge of Patents
  Very	
  long	
  sentences	
  as	
  standard	
  
  Gramma1cally	
  incomplete	
  using	
  
nominal	
  and	
  telegraphic	
  style	
  (!)	
  
  Passive	
  forms	
  are	
  frequent	
  
  Frequent	
  use	
  of	
  subordinate	
  clauses,	
  
par1ciples,	
  implicit	
  constructs	
  
  Inconsistent	
  and	
  incorrect	
  spelling	
  
  High	
  use	
  of	
  neologisms	
  	
  
  Instances	
  of	
  synonymy	
  and	
  polysemy	
  	
  
  Spurious	
  use	
  of	
  punctua1on	
  
Authoring guide
for “to be
translated” text
Patents break
almost all of the
rules!
Judge the quality of an MT system by comparing its output against a
human-produced “reference” translation
§  Pros: Quick, cheap, consistent
§  Cons: Inflexible, cannot be used on ‘new’ input
§  Pros: Reliable, flexible, multi-faceted (fluency, error analyses,
benchmarking)
§  Cons: Slow, expensive, subjective
§  Fluency vs. Adequacy
Evaluating Machine Translation Quality
Automatic Evaluation
Human Evaluation
Task-Based Evaluation
Evaluating Machine Translation Quality
Task Based Evaluation
§  Standalone evaluation of MT systems is necessary to get a sense of the
overall quality of a system
§  To determine the ultimate usability of an MT system, intrinsic task-based
evaluation is required
§  Why? Fluency vs. Adequacy
Fluency how fluent and grammatically correct the translation
output is
Adequacy how accurately the translation conveys the meaning of the
source
Output 1 The big blue house
Output 2 The big house red
Source La gran casa roja
Task-Based Evaluation
Practical uses of Machine Translation
Understand its limitations and you’ll understand
its capabilities!
No
§  Translate a patent for filing
§  Translate literature for
publication
§  Translate marketing materials
§  Anything mission critical
without review
Yes
§  Productivity tool for
professional translation
§  Understand foreign patents
§  Localisation processes and
“controlled’ content
§  High volume, e.g. eDiscovery
Use cases in practice
Product descriptions
to open new markets
MT for post-editing
productivity across
industries
Developer, and user
for web content
Tens of thousands of
people using online
tools daily
  Neural Networks
§  Using artificial intelligence and deep learning to develop a
completely new way of doing machine translation!
  Quality Estimation
§  Functionality through which machine translation can “self-
assess” the quality of the translations it produces.
  Online Adaptive Translation
§  Machine translations that can automatically learn and improve
based on feedback, particularly from revisions.
  Use-case specific MT
§  Just like patent MT, but for countless other areas.
Current Hot Topics
We provide Machine Translation
solutions with Subject Matter Expertise
About Iconic
Chinese pre-ordering
rules
Statistical
Post-editing
Input
Output
Training Data
Spanish med-device
entity recognizer
Multi-output
Combination
Korean pharma
tokenizer
Patent input
classifier
Client TM/terminology (optional)
Japanese script
normalisation
German
Compounding rules
Moses
RBMT
Moses
Moses
Combining linguistics, statistics, and MT expertise
The Ensemble ArchitectureTM
IPTranslatorPatent Translation by Iconic Translation Machines
IPTranslatorPatent Translation by Iconic Translation Machines
Speed, Cost, and Quality
What is the difference between machine translation vs. manual translation when
translating a 10 page patent document from Chinese into English?
Machine Translation is not
d e s i g ne d t o re p l a ce
professional translation
but there are many cases
where costly and time-
c o n s u m i n g m a n u a l
translation is simply not
necessary.
IPTranslatorPatent Translation by Iconic Translation Machines
-  Data confidentiality
-  File formats
-  Potential for customisation,
enhancements, and
improvement for specific
domains
IPTranslatorPatent Translation by Iconic Translation Machines
.com
Visit
and use the promo code epopic2015 to get
20 free pages of translation
Thank You!
john@iptranslator.com
@IconicTrans

Contenu connexe

Tendances

9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
RIILP
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
RIILP
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
RIILP
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
RIILP
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction
RIILP
 

Tendances (20)

2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
2. Project Management - Alexandre Helle & Manuel Herranz (Pangeanic)
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents
 
6. Entrepreneurship - Juan Jose Arevalillo Doval (Hermes)
6. Entrepreneurship - Juan Jose Arevalillo Doval (Hermes)6. Entrepreneurship - Juan Jose Arevalillo Doval (Hermes)
6. Entrepreneurship - Juan Jose Arevalillo Doval (Hermes)
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
High Volume, Rapid Turn Around Localization: Lessons Learned
High Volume, Rapid Turn Around Localization: Lessons LearnedHigh Volume, Rapid Turn Around Localization: Lessons Learned
High Volume, Rapid Turn Around Localization: Lessons Learned
 
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?
 
MT and Post Editing in master's level translation education
MT and Post Editing in master's level translation education MT and Post Editing in master's level translation education
MT and Post Editing in master's level translation education
 
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
Introduction To Translation Technologies
Introduction To Translation TechnologiesIntroduction To Translation Technologies
Introduction To Translation Technologies
 
Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...Introducing language technology in the editing process: How to do things righ...
Introducing language technology in the editing process: How to do things righ...
 

En vedette

TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS - The Language Data Network
 
User Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineUser Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia Online
ABBYY Language Serivces
 

En vedette (10)

machine translation beginning...
machine translation beginning...machine translation beginning...
machine translation beginning...
 
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
 
User Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineUser Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia Online
 
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
 
Why MT Matters
Why MT MattersWhy MT Matters
Why MT Matters
 
ICIC 2014 High volume, High Quality Patent Translation across Multiple Domain...
ICIC 2014 High volume, High Quality Patent Translation across Multiple Domain...ICIC 2014 High volume, High Quality Patent Translation across Multiple Domain...
ICIC 2014 High volume, High Quality Patent Translation across Multiple Domain...
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
TAUS Scotland Asia Online Technology Platform V1
TAUS Scotland  Asia Online Technology Platform   V1TAUS Scotland  Asia Online Technology Platform   V1
TAUS Scotland Asia Online Technology Platform V1
 
Machine translation
Machine translationMachine translation
Machine translation
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 

Similaire à The Latest Advances in Patent Machine Translation

Introduction to Technical Documentation Localization with Acclaro
Introduction to Technical Documentation Localization with AcclaroIntroduction to Technical Documentation Localization with Acclaro
Introduction to Technical Documentation Localization with Acclaro
Acclaro
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Apache OpenNLP
 
Language Grid
Language GridLanguage Grid
Language Grid
lindh
 
Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...
Moses Altovar
 

Similaire à The Latest Advances in Patent Machine Translation (20)

Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
 
Cyflwyniad Bloc
Cyflwyniad BlocCyflwyniad Bloc
Cyflwyniad Bloc
 
Introduction to Technical Documentation Localization with Acclaro
Introduction to Technical Documentation Localization with AcclaroIntroduction to Technical Documentation Localization with Acclaro
Introduction to Technical Documentation Localization with Acclaro
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
Translation Resources
Translation ResourcesTranslation Resources
Translation Resources
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
machine transaltion
machine transaltionmachine transaltion
machine transaltion
 
Language Grid
Language GridLanguage Grid
Language Grid
 
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
 
Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...
 
Nlp
NlpNlp
Nlp
 
Multi lingual corpus for machine aided translation
Multi lingual corpus for machine aided translationMulti lingual corpus for machine aided translation
Multi lingual corpus for machine aided translation
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 

Plus de Iconic Translation Machines

Plus de Iconic Translation Machines (7)

The growing role of translation technology in e-discovery, litigation, digita...
The growing role of translation technology in e-discovery, litigation, digita...The growing role of translation technology in e-discovery, litigation, digita...
The growing role of translation technology in e-discovery, litigation, digita...
 
Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
Innovative Business and Pricing Models: for MT
Innovative Business and Pricing Models: for MTInnovative Business and Pricing Models: for MT
Innovative Business and Pricing Models: for MT
 
MT Evaluation: Seeing the Wood for the Trees
MT Evaluation: Seeing the Wood for the TreesMT Evaluation: Seeing the Wood for the Trees
MT Evaluation: Seeing the Wood for the Trees
 
From the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchFrom the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT Research
 
Beyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter ExpertiseBeyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter Expertise
 

Dernier

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Dernier (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 

The Latest Advances in Patent Machine Translation

  • 1. ‘Machine Translation 101’ The Latest Advances in Translation Technology for Patent Information Dr. John Tinsley CEO, Iconic Translation Machines Ltd. EPOPIC. Copenhagen. 11th November 2015
  • 2. The need for translation 50% of all PCT applications in 2013 came from Asia
  • 3.   BSc in Computational Linguistics   PhD in Machine Translation   Language Technology consultant   Founder of Iconic Translation Machines Why listen to me? Machine Translation is what I do! The world’s first and only patent specific machine translation platform IPTranslatorPatent Translation by Iconic Translation Machines
  • 4. §  The use of computers to translate from one language into another §  The use of computers to automate some, or all, of the translation process §  An approach to Machine Translation, where translations for an input are estimated based on previous seen translation examples and associated (inferred) probabilities. §  e.g. IPTranslator, Google Translate §  Rule-based (or transfer-based): based on linguistic rules •  e.g. Systran; Altavista’s Babelfish §  Example-based: based on translation examples and inferred linguistic patterns Machine Translation: The Basics Machine Translation = automatic translation Statistical Machine Translation (SMT) Other approaches SMT is now by far the predominant approach
  • 5. A corpus (pl. corpora) is a collection of texts, in electronic format, in a single language §  document(s) §  book(s) Bilingual Corpora a bilingual corpus   Note source language = original language or language we’re translating from target language = language we’re translating into A bilingual corpus is a collection of corresponding texts, in multiple languages §  a document & its translation §  a book in multiple languages §  European Parliament proceedings
  • 6. Aligned Bilingual Corpora A document-aligned bilingual corpus corresponds on a document level For translation, we required sentence-aligned bilingual corpora §  The sentence on line 1 in the source language text corresponds to (i.e. is a translation of) the sentence on line 1 in the target language text etc. §  Often referred to as parallel aligned corpora Sentence aligned bilingual parallel corpora are essential for statistical machine translation
  • 7. Learning from Previous Translations Suppose we already know (from a sentence-aligned bilingual corpus) that: §  “dog” is translated as “perro” §  “I have a cat” is translated as “Tengo un gato” We can theoretically translate: §  “I have a dog” à “Tengo un perro” §  Even though we have never seen “I have a dog” before Statistical machine translation induces information about unseen input, based on previously known translations: §  Primarily co-occurrence statistics §  Takes contextual information into account
  • 8. Statistical Machine Translation §  Example of a small sentence-aligned bilingual corpus for English-French
  • 9. Statistical Machine Translation §  We take some new sentence to translate
  • 10. Statistical Machine Translation §  From the corpus we can infer possible target (French) translations for various source (English) words §  We can then select the most probable translations based on simple frequencies (co-occurrence statistics)
  • 11. Statistical Machine Translation Given a previously unseen input sentence, and our collated statistics, we can estimate translation
  • 12. Advanced MT All modern approaches are based on building translations for complete sentences by putting together smaller pieces of translation Previous example is very simplistic §  In reality SMT systems calculate much more complex statistical models over millions of sentence pairs for a pair of languages §  Upwards of 2M sentence pairs on average for large-scale systems §  Word-to-word translation probabilities §  Phrase-to-phrase translation probabilities §  Word order probabilities §  Linguistic information (are the words nouns, verbs?) §  Fluency of the final output   Previous example is very simplistic Other statistics calculated include
  • 13. Data is Key For SMT data is key §  Information (word/phrase correspondences and associated statistics) is only based on what we have seen before in the data Important that data used to train SMT systems is: §  Of sufficient size §  avoid sparseness/skewed statistics §  Representative and relevant §  contains the right type of language §  High-quality §  absence of misspellings, incorrect alignments etc. §  Proofed by human translators training data
  • 14. Why is MT Difficult? A word or a phrase can have more than one meaning (ambiguity – lexical or structural) §  e.g. “bank”, “dive”, “I saw the man with the telescope” People use language creatively §  New words are cropping up all the time Linguistic differences between languages §  e.g. structure of Irish sentences vs. structure of English sentences: §  “Tá (Is) ocras (hunger) orm (on me)” <-> “I am hungry” There can be more than one way to express the same meaning. §  “New York”, “The Big Apple”, “NYC”
  • 15. Why is MT Difficult? §  Israeli officials are responsible for airport security. §  Israel is in charge of the security at this airport. §  The security work for this airport is the responsibility of the Israel government. §  Israeli side was in charge of the security of this airport. §  Israel is responsible for the airport’s security. §  Israel is responsible for safety work at this airport. §  Israel presides over the security of the airport. §  Israel took charge of the airport security. §  The safety of this airport is taken charge of by Israel. §  This airport’s security is the responsibility of the Israeli security officials.
  • 16. No single solution for all languages Number agreement: the house / the houses vs. la maison / les maisons Gender agreement: the house / the cheese vs. la maison / le frommage English - Spanish English - French
  • 17. No single solution for all languages English - German English - Chinese 种水果的农民 The farmer who grows fruit [Lit: “grow fruit (particle) farmer”]
  • 18. The Challenge of Patents L is an organic group selected from -CH2- (OCH2CH2)n-, -CO-NR'-, with R'=H or C1-C4 alkyl group; n=0-8; Y=F, CF3 … maximum stress of 1.2 to 3.5 N/mm<2> and a maximum elongation of 700 to 1,300% at 0[deg.] C. Long Sentences Technical constructions Largest single document: 249,322 words Longest Sentence: 1,417 words
  • 19. The Challenge of Patents   Very  long  sentences  as  standard     Gramma1cally  incomplete  using   nominal  and  telegraphic  style  (!)     Passive  forms  are  frequent     Frequent  use  of  subordinate  clauses,   par1ciples,  implicit  constructs     Inconsistent  and  incorrect  spelling     High  use  of  neologisms       Instances  of  synonymy  and  polysemy       Spurious  use  of  punctua1on   Authoring guide for “to be translated” text Patents break almost all of the rules!
  • 20. Judge the quality of an MT system by comparing its output against a human-produced “reference” translation §  Pros: Quick, cheap, consistent §  Cons: Inflexible, cannot be used on ‘new’ input §  Pros: Reliable, flexible, multi-faceted (fluency, error analyses, benchmarking) §  Cons: Slow, expensive, subjective §  Fluency vs. Adequacy Evaluating Machine Translation Quality Automatic Evaluation Human Evaluation Task-Based Evaluation
  • 21. Evaluating Machine Translation Quality Task Based Evaluation §  Standalone evaluation of MT systems is necessary to get a sense of the overall quality of a system §  To determine the ultimate usability of an MT system, intrinsic task-based evaluation is required §  Why? Fluency vs. Adequacy Fluency how fluent and grammatically correct the translation output is Adequacy how accurately the translation conveys the meaning of the source Output 1 The big blue house Output 2 The big house red Source La gran casa roja Task-Based Evaluation
  • 22. Practical uses of Machine Translation Understand its limitations and you’ll understand its capabilities! No §  Translate a patent for filing §  Translate literature for publication §  Translate marketing materials §  Anything mission critical without review Yes §  Productivity tool for professional translation §  Understand foreign patents §  Localisation processes and “controlled’ content §  High volume, e.g. eDiscovery
  • 23. Use cases in practice Product descriptions to open new markets MT for post-editing productivity across industries Developer, and user for web content Tens of thousands of people using online tools daily
  • 24.   Neural Networks §  Using artificial intelligence and deep learning to develop a completely new way of doing machine translation!   Quality Estimation §  Functionality through which machine translation can “self- assess” the quality of the translations it produces.   Online Adaptive Translation §  Machine translations that can automatically learn and improve based on feedback, particularly from revisions.   Use-case specific MT §  Just like patent MT, but for countless other areas. Current Hot Topics
  • 25. We provide Machine Translation solutions with Subject Matter Expertise About Iconic
  • 26. Chinese pre-ordering rules Statistical Post-editing Input Output Training Data Spanish med-device entity recognizer Multi-output Combination Korean pharma tokenizer Patent input classifier Client TM/terminology (optional) Japanese script normalisation German Compounding rules Moses RBMT Moses Moses Combining linguistics, statistics, and MT expertise The Ensemble ArchitectureTM
  • 27. IPTranslatorPatent Translation by Iconic Translation Machines
  • 28. IPTranslatorPatent Translation by Iconic Translation Machines Speed, Cost, and Quality What is the difference between machine translation vs. manual translation when translating a 10 page patent document from Chinese into English? Machine Translation is not d e s i g ne d t o re p l a ce professional translation but there are many cases where costly and time- c o n s u m i n g m a n u a l translation is simply not necessary.
  • 29. IPTranslatorPatent Translation by Iconic Translation Machines -  Data confidentiality -  File formats -  Potential for customisation, enhancements, and improvement for specific domains
  • 30. IPTranslatorPatent Translation by Iconic Translation Machines .com Visit and use the promo code epopic2015 to get 20 free pages of translation