SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Neural Machine Translation
via Binary Code Prediction
Yusuke Oda (1)
Philip Arthur (1)
Graham Neubig (2, 1)
Koichiro Yoshino (1, 3)
Satoshi Nakamura (1)
(1) Nara Institute of Science and Technology
(2) Carnegie Mellon University
(3) Japan Science and Technology Agency
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 2
Motivation of This Work
● Neural machine translation models tends to be HEAVY
– Due to softmax that requires O(V) matrix multiplication
● Reduce computation amount of the output layer
HEAVY.
Let's reduce it.
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 3
Requirements of Output Layers
● Less memory allow to load the model
on resource-restricted devices.Memory
● Less computation allow to run the model
not on expensive processors.Speed
● Model parallelization is also important
to keep fast training.Parallelism
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 4
Proposed: Binary Code Prediction
● Predicts bits of word ID numbers instead of label.
Softmax
O(V)
Binary Code Prediction
O(log V)
h
01 10
arg max word ID
h
Outputs
predicted IDs word ID
Loss: cross entropy
(or equivalent one)
Loss: bit-wise errors
(used squared loss in experiments)
Softmax Sigmoid
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 5
Comparison of Word Prediction Methods
Methods Memory
Speed
(Train)
Speed
(Test)
Parallelism
Hierarchical Softmax
(Morin&Bengio, 2005)
O(V) O(logV) O(logV) △
Differentiated Softmax
(Chen+, 2016)
O(V/K) O(V/K) O(V/K) ◎
Sampling
(many work)
O(V) O(K) O(V) ◯
Vocabulary Selection
(Mi et al., 2016)
(L’Hostis et al., 2016)
O(V) O(V) O(K) ◯
Character/Subword
(many work)
O(V') O(V') O(V') ◎
Binary Code O(K+logV) O(K+logV) O(K+logV) ◎
V: vocabulary size, K: model-specific parameter
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 6
Problems of Naïve Prediction
● Naïve binary code prediction models reduce accuracy.
● Cause 2: Unbalanced word frequency
● Frequent words bothers rare words
because all words share same parameters.
→ Separates models for frequent/rare words.
● Cause 1: Robustness of bit arrays
● One-off bit errors (even only 1 bits) generates different words.
→ Requires more redundant bit representations.
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 7
Applying Error-correcting Codes (1)
● Introduces robustness into bit arrays.
Original code 1 0 1
Redundant code 1 0 1 1 0 1 1 0 1
Encoding
Obtained code
with errors
1 0 1 0 0 1 1 1 1
ERR ERR
Restored code w/o errors 1 0 1
Decoding
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 8
Applying Error-correcting Codes (2)
● Introduces redundancy of bit arrays.
– We used a type of convolutional code that has some good
characteristics of our model.
Outputs
Ground
truth
Hidden
Train
Increases number of bits
word Encode
Training Test
Outputs
Hidden
Decode
Absorbs bit errors
word Decode
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 9
Softmax+Binary (Hybrid) Model
● Directly predicts frequent N words by softmax.
– N is set according to the corpus difficulty.
– Softmax layer = Frequent words and "OTHER"
→ When "OTHER" was predicted, then use the binary layer.
Frequent words Rare words
Softmax Sigmoid
OTHER
Word scores Bits of word ID
Softmax size N
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 10
Experiments
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 11
Objective of Experiments
● Measures characteristics of binary code prediction models
by comparing with the softmax.
Translation Accuracy
BLEU
Memory Consumption
Size of Output Layers
Training
on GPUs
Testing
on GPUs/CPUs
(CPU: no multi-threading)
Processing Speed / Parallelism
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 12
0 30 60 90 120 150 180
0%
5%
10%
15%
20%
25%
30%
35%
Hybrid-512-EC Hybrid-512
Binary-EC Binary
Softmax
#trained minibatches (x1000)
BLEU
Results: Training Curves
● Languages: En→Ja
– Domain: Scientific Papers
(ASPEC 2M)
Hybrid-EC: 512+44 outputs
Hybrid: 512+16 outputs
Binary-EC: 44 outputs
Softmax: 65536 outputs
Binary: 16 outputs
● Naïve Binary prediction
is poor than others.
● Two additional methods
can improve accuracy
– Hybrid
– Error-correcting code
– Both
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 13
Results: Speed (ASPEC En→Ja)
Train (GPU; per minibatch) Test (GPU; per sent.) Test (CPU; per sent.)
0
500
1000
1500
2000
2500
3000
Softmax
Binary
Hybrid-512
Hybrid-2048
Binary-EC
Hybrid-512-EC
Hybrid-2048-EC
ProcessingTime[ms]
● 20-30% faster on GPUs
→No extra cost on parallel computation
Softmax
Our models
● x10 faster on CPUs.
→Can perform fast even on powerless devices.
2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 14
Summary
● Proposed method
– NMT output layer based on binary code prediction
– 2 model improvements
● Hybrid models with Softmax and binary code
● Applying error-correcting codes
● Results
– Comparative BLEU with Softmax
– Reduces size of output layers to 1/10
– Speed-up (especially on CPUs by x10)
● Future work
– Introducing more efficient raw bit arrays/losses/error-correction
– Analyzing model (what happened by introducing binary code?)

Contenu connexe

Tendances

A Python library for computing light scattering by multilayered non-spherical...
A Python library for computing light scattering by multilayered non-spherical...A Python library for computing light scattering by multilayered non-spherical...
A Python library for computing light scattering by multilayered non-spherical...
avinokurov
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 
Tesseract OCR Engine
Tesseract OCR EngineTesseract OCR Engine
Tesseract OCR Engine
Raghu nath
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Yusuke Izawa
 

Tendances (19)

Generating sentences from a continuous space
Generating sentences from a continuous spaceGenerating sentences from a continuous space
Generating sentences from a continuous space
 
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
 
Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)
 
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority TransitionsSimulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
A Python library for computing light scattering by multilayered non-spherical...
A Python library for computing light scattering by multilayered non-spherical...A Python library for computing light scattering by multilayered non-spherical...
A Python library for computing light scattering by multilayered non-spherical...
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Tesseract OCR Engine
Tesseract OCR EngineTesseract OCR Engine
Tesseract OCR Engine
 
Language-Independent Detection of Object-Oriented Design Patterns
Language-Independent Detection of Object-Oriented Design PatternsLanguage-Independent Detection of Object-Oriented Design Patterns
Language-Independent Detection of Object-Oriented Design Patterns
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One Engine
 
PBLib - A Library for Encoding Pseudo-Boolean Constraints into CNF
PBLib - A Library for Encoding Pseudo-Boolean Constraints into CNFPBLib - A Library for Encoding Pseudo-Boolean Constraints into CNF
PBLib - A Library for Encoding Pseudo-Boolean Constraints into CNF
 
N20181217
N20181217N20181217
N20181217
 
Building streaming pipelines for neural machine translation
Building streaming pipelines for neural machine translationBuilding streaming pipelines for neural machine translation
Building streaming pipelines for neural machine translation
 
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
 
Neural Waveform Modeling
Neural Waveform Modeling Neural Waveform Modeling
Neural Waveform Modeling
 

Similaire à Neural Machine Translation via Binary Code Prediction

“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
Edge AI and Vision Alliance
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
ideas2ignite
 

Similaire à Neural Machine Translation via Binary Code Prediction (20)

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
I/O Challenges in Brain Tissue Simulation
I/O Challenges in Brain Tissue SimulationI/O Challenges in Brain Tissue Simulation
I/O Challenges in Brain Tissue Simulation
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx
 
Challenges in Embedded Development
Challenges in Embedded DevelopmentChallenges in Embedded Development
Challenges in Embedded Development
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
MySQL 8.0 & Unicode: Why, what & how
MySQL 8.0 & Unicode: Why, what & howMySQL 8.0 & Unicode: Why, what & how
MySQL 8.0 & Unicode: Why, what & how
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Computer preemption and TotalView have made debugging Pascal much more seamless
Computer preemption and TotalView have made debugging Pascal much more seamlessComputer preemption and TotalView have made debugging Pascal much more seamless
Computer preemption and TotalView have made debugging Pascal much more seamless
 
How to Use OpenMP on Native Activity
How to Use OpenMP on Native ActivityHow to Use OpenMP on Native Activity
How to Use OpenMP on Native Activity
 
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
GFX Part 1 - Introduction to GPU HW and OpenGL ES specificationsGFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
 
Ag32224229
Ag32224229Ag32224229
Ag32224229
 
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorEarly Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
 
Performance and Analysis of Video Compression Using Block Based Singular Valu...
Performance and Analysis of Video Compression Using Block Based Singular Valu...Performance and Analysis of Video Compression Using Block Based Singular Valu...
Performance and Analysis of Video Compression Using Block Based Singular Valu...
 
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
 

Plus de Yusuke Oda

複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
Yusuke Oda
 

Plus de Yusuke Oda (10)

primitiv: Neural Network Toolkit
primitiv: Neural Network Toolkitprimitiv: Neural Network Toolkit
primitiv: Neural Network Toolkit
 
ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@
 
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
 
Encoder-decoder 翻訳 (TISハンズオン資料)
Encoder-decoder 翻訳 (TISハンズオン資料)Encoder-decoder 翻訳 (TISハンズオン資料)
Encoder-decoder 翻訳 (TISハンズオン資料)
 
A Chainer MeetUp Talk
A Chainer MeetUp TalkA Chainer MeetUp Talk
A Chainer MeetUp Talk
 
PCFG構文解析法
PCFG構文解析法PCFG構文解析法
PCFG構文解析法
 
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
 
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
 
Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3
 
Test
TestTest
Test
 

Dernier

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 

Neural Machine Translation via Binary Code Prediction

  • 1. Neural Machine Translation via Binary Code Prediction Yusuke Oda (1) Philip Arthur (1) Graham Neubig (2, 1) Koichiro Yoshino (1, 3) Satoshi Nakamura (1) (1) Nara Institute of Science and Technology (2) Carnegie Mellon University (3) Japan Science and Technology Agency
  • 2. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 2 Motivation of This Work ● Neural machine translation models tends to be HEAVY – Due to softmax that requires O(V) matrix multiplication ● Reduce computation amount of the output layer HEAVY. Let's reduce it.
  • 3. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 3 Requirements of Output Layers ● Less memory allow to load the model on resource-restricted devices.Memory ● Less computation allow to run the model not on expensive processors.Speed ● Model parallelization is also important to keep fast training.Parallelism
  • 4. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 4 Proposed: Binary Code Prediction ● Predicts bits of word ID numbers instead of label. Softmax O(V) Binary Code Prediction O(log V) h 01 10 arg max word ID h Outputs predicted IDs word ID Loss: cross entropy (or equivalent one) Loss: bit-wise errors (used squared loss in experiments) Softmax Sigmoid
  • 5. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 5 Comparison of Word Prediction Methods Methods Memory Speed (Train) Speed (Test) Parallelism Hierarchical Softmax (Morin&Bengio, 2005) O(V) O(logV) O(logV) △ Differentiated Softmax (Chen+, 2016) O(V/K) O(V/K) O(V/K) ◎ Sampling (many work) O(V) O(K) O(V) ◯ Vocabulary Selection (Mi et al., 2016) (L’Hostis et al., 2016) O(V) O(V) O(K) ◯ Character/Subword (many work) O(V') O(V') O(V') ◎ Binary Code O(K+logV) O(K+logV) O(K+logV) ◎ V: vocabulary size, K: model-specific parameter
  • 6. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 6 Problems of Naïve Prediction ● Naïve binary code prediction models reduce accuracy. ● Cause 2: Unbalanced word frequency ● Frequent words bothers rare words because all words share same parameters. → Separates models for frequent/rare words. ● Cause 1: Robustness of bit arrays ● One-off bit errors (even only 1 bits) generates different words. → Requires more redundant bit representations.
  • 7. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 7 Applying Error-correcting Codes (1) ● Introduces robustness into bit arrays. Original code 1 0 1 Redundant code 1 0 1 1 0 1 1 0 1 Encoding Obtained code with errors 1 0 1 0 0 1 1 1 1 ERR ERR Restored code w/o errors 1 0 1 Decoding
  • 8. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 8 Applying Error-correcting Codes (2) ● Introduces redundancy of bit arrays. – We used a type of convolutional code that has some good characteristics of our model. Outputs Ground truth Hidden Train Increases number of bits word Encode Training Test Outputs Hidden Decode Absorbs bit errors word Decode
  • 9. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 9 Softmax+Binary (Hybrid) Model ● Directly predicts frequent N words by softmax. – N is set according to the corpus difficulty. – Softmax layer = Frequent words and "OTHER" → When "OTHER" was predicted, then use the binary layer. Frequent words Rare words Softmax Sigmoid OTHER Word scores Bits of word ID Softmax size N
  • 10. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 10 Experiments
  • 11. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 11 Objective of Experiments ● Measures characteristics of binary code prediction models by comparing with the softmax. Translation Accuracy BLEU Memory Consumption Size of Output Layers Training on GPUs Testing on GPUs/CPUs (CPU: no multi-threading) Processing Speed / Parallelism
  • 12. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 12 0 30 60 90 120 150 180 0% 5% 10% 15% 20% 25% 30% 35% Hybrid-512-EC Hybrid-512 Binary-EC Binary Softmax #trained minibatches (x1000) BLEU Results: Training Curves ● Languages: En→Ja – Domain: Scientific Papers (ASPEC 2M) Hybrid-EC: 512+44 outputs Hybrid: 512+16 outputs Binary-EC: 44 outputs Softmax: 65536 outputs Binary: 16 outputs ● Naïve Binary prediction is poor than others. ● Two additional methods can improve accuracy – Hybrid – Error-correcting code – Both
  • 13. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 13 Results: Speed (ASPEC En→Ja) Train (GPU; per minibatch) Test (GPU; per sent.) Test (CPU; per sent.) 0 500 1000 1500 2000 2500 3000 Softmax Binary Hybrid-512 Hybrid-2048 Binary-EC Hybrid-512-EC Hybrid-2048-EC ProcessingTime[ms] ● 20-30% faster on GPUs →No extra cost on parallel computation Softmax Our models ● x10 faster on CPUs. →Can perform fast even on powerless devices.
  • 14. 2017/08/01 Copyright (c) 2017 by Yusuke Oda. All Rights Reserved. 14 Summary ● Proposed method – NMT output layer based on binary code prediction – 2 model improvements ● Hybrid models with Softmax and binary code ● Applying error-correcting codes ● Results – Comparative BLEU with Softmax – Reduces size of output layers to 1/10 – Speed-up (especially on CPUs by x10) ● Future work – Introducing more efficient raw bit arrays/losses/error-correction – Analyzing model (what happened by introducing binary code?)