SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Accelerating Stochastic Gradient
Descent Using Adaptive
Mini-Batch Size
Authors:
● Muayyad Alsadi <alsadi@gmail.com>
● Rawan Ghnemat <r.ghnemat@psut.edu.jo>
● Arafat Awajan <awajan@psut.edu.jo>
What if you could
just fast-forward
through training
process?
8x
This way training becomes
feasible even on commodity
CPUs (without GPUs),
getting high accuracy within
hours.
Background
Artificial Neural Network (ANN) / Some Types and Applications
● Fully connected multi-layer Deep Neural Networks (DNN)
● Convolutional Neural Network (CNN)
○ Spacial (Image)
○ Context (Text and NLP)
● Recursive Neural Network
○ Sequences (Text letters, Stock events)
Artificial Neural Network (ANN) / Some Types and Applications
● Convolutional Neural Network (CNN)
○ Spacial (Image): classification/regression
○ Context (Text and NLP): classification/regression
● Recursive Neural Network
○ Sequences (Text letters, Stock events)
■ Seq2Seq: Translation, summarization, ...
■ Seq2Label
■ Seq2Value
● Massive number of trainable weights to tune
● Massive number Multiply–Accumulate (MAC) operations
● Vanishing/Exploding Gradients
Deep Learning / some challenges
● Massive number of trainable weights to tune
● Massive number Multiply–Accumulate (MAC) operations
○ Low throughput (ex. images/second)
● Vanishing/Exploding Gradients
○ Slow to converge
Deep Learning / some challenges
Input Output
Deep Neural Network
Millions of Operations Per Item
Sample
Batch Update
Deep Neural Network
Given Labels
Output
A training step: process a batch and update weights
“Stochastic Learning” or “Stochastic Gradient Descent” (SGD) is
done by taking small random samples (mini-batches) instead of the
whole batch of training data “Batch Learning”. Faster to converge
and better in handling the noise and non-linearity. That’s why batch
learning was considered inefficient[1][2]
.
1. Y. LeCun, “Efficient backprop”
2. D. R. Wilson and T. R. Martinez, “The general inefficiency of batch training for gradient descent
learning,”
Batch Learning vs. Stochastic Learning
Sample Update
Deep Neural Network
Given Labels
Output
Factors Affecting Convergence Speed
Sample Size
Design Complexity / Depth / Number of MAC operators
# Classes
Learning Rate
Momentum
Opt. Algo.
Literature Review
● Sample size related
● Learning rate related
● Optimization Algorithm Related
● NN design related
● Transforming Input/Output
Literature Review
● Sample size related
○ Too big batch-size (8192
images per batch)
○ Increasing batch-size
● Learning rate related
○ Per-dimension
○ Fading
○ Momentum
○ Cyclic
○ Warm restart...
● Optimization Algorithm Related
○ AdaGrad, Adam, AdaDelta, ...
Literature Review / see paper
● NN design related
○ SqueezeNet, MobileNet
○ Separable operators
○ Batch-norm
○ Early AUX classifier branches
● Transforming Input/Output
○ Reusing existing model
(fine-tuning)
○ Knowledge transfer
Proposed Method
Do very high risk initializations using extremely small
mini-batch size (ex. 4 or 8 samples per batch). Then
“Train-Measure-Adapt-Repeat”. As long as it’s getting better
results keep using such fast-forwarding settings. When stuck
use larger mini-batch size (for example, 32 samples per
batch).
Proposed Method
ff_criteria can be defined
with respect to change in
evaluation accuracy like this
If (acc_new>acc_old) then
mode=ff
else
model=normal
● Specially for cold start (initialization)
● Instead of too big batch-size like 8,192 samples per batch
use extremely small mini-batch size like 4 or 8 samples
per batch! (as long as hardware is fully utilized)
● The network is too cold, it’s already too bad and you have
nothing to lose.
Use extremely small mini-batch size
Assuming that the hardware is fully utilized and have
constant throughput (Images/Seconds), processing a sample
of 8 images is 4 times faster than processing a batch of 32
images. Doing 4 times more updates.
A good guess for batch size is number of cores in your
computer. (scope of paper is training on commodity
hardware).
Why it ticks faster?
By using 4x smaller batch-size, we are doing 4x more higher
risk updates.
Batch size have linear effect on speed but effect on accuracy
is not linear.
Don’t look at accuracy by number of steps but look at
accuracy over time.
It ticks faster but does it converge faster?
Experiments: Fine-tuning
Inception v1 pre-trained on
ImageNet 1K task.
Experiment: The Caltech-UCSD
Birds-200-2011 Dataset
Experiment: Birds 200 Dataset
Accuracy over steps: accuracy of batch-size=10 (in cyan) is always below others
Misleading
Accuracy over time: accuracy of batch-size=10 (in cyan) reached 56% in 2 hours,
while others were lagging behind at 40%, 28%, and 10%.
Experiment: The Oxford-IIIT Pet
Dataset (Pets-37)
Experiment: Pets-37 Dataset
Eval accuracy over time: using mini-batch size of 8 reached 80% accuracy within
one hour only.
Experiment: Adaptive part on
Birds-200 Dataset
Eval accuracy over time: reaching ~72% accuracy within ~2:20 hours
Summary:
Train-Measure-Adapt-Repeat
Summary: Train-Measure-Adapt-Repeat
● Start with very small mini-batch size and large learning rate
○ BatchSize=4; LearningRate=0.1
● Let mini-batch size be cyclic
○ Switch between two settings (batch size of 8 and 32)
○ Adaptive, non-periodic, based on evaluation accuracy
○ Change the bounds of the settings as you go
Q & A
Thank you
Follow me on Github
http://muayyad-alsadi.github.io/

Contenu connexe

Tendances

Scalable Learning in Computer Vision
Scalable Learning in Computer VisionScalable Learning in Computer Vision
Scalable Learning in Computer Vision
butest
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 

Tendances (20)

Scalable Learning in Computer Vision
Scalable Learning in Computer VisionScalable Learning in Computer Vision
Scalable Learning in Computer Vision
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
Google Big Data Expo
Google Big Data ExpoGoogle Big Data Expo
Google Big Data Expo
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
 
Mahoney mlconf-nov13
Mahoney mlconf-nov13Mahoney mlconf-nov13
Mahoney mlconf-nov13
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVM
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 

Similaire à Accelerating stochastic gradient descent using adaptive mini batch size3

Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
eArtius, Inc.
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 

Similaire à Accelerating stochastic gradient descent using adaptive mini batch size3 (20)

A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale ArchitecturesScaling Deep Learning Algorithms on Extreme Scale Architectures
Scaling Deep Learning Algorithms on Extreme Scale Architectures
 
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
 
TensorfLow_Basic.pptx
TensorfLow_Basic.pptxTensorfLow_Basic.pptx
TensorfLow_Basic.pptx
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing data
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 

Plus de muayyad alsadi (7)

Visualizing botnets with t-SNE
Visualizing botnets with t-SNEVisualizing botnets with t-SNE
Visualizing botnets with t-SNE
 
Taking your code to production
Taking your code to productionTaking your code to production
Taking your code to production
 
Introduction to Raft algorithm
Introduction to Raft algorithmIntroduction to Raft algorithm
Introduction to Raft algorithm
 
Techtalks: taking docker to production
Techtalks: taking docker to productionTechtalks: taking docker to production
Techtalks: taking docker to production
 
How to think like hardware hacker
How to think like hardware hackerHow to think like hardware hacker
How to think like hardware hacker
 
الاختيار بين التقنيات
الاختيار بين التقنياتالاختيار بين التقنيات
الاختيار بين التقنيات
 
ملتقى الصناع هيا نصنع أردوينو وندخل إلى خفاياه
ملتقى الصناع  هيا نصنع أردوينو وندخل إلى خفاياهملتقى الصناع  هيا نصنع أردوينو وندخل إلى خفاياه
ملتقى الصناع هيا نصنع أردوينو وندخل إلى خفاياه
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Dernier (20)

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Accelerating stochastic gradient descent using adaptive mini batch size3

  • 1. Accelerating Stochastic Gradient Descent Using Adaptive Mini-Batch Size
  • 2. Authors: ● Muayyad Alsadi <alsadi@gmail.com> ● Rawan Ghnemat <r.ghnemat@psut.edu.jo> ● Arafat Awajan <awajan@psut.edu.jo>
  • 3. What if you could just fast-forward through training process? 8x This way training becomes feasible even on commodity CPUs (without GPUs), getting high accuracy within hours.
  • 5. Artificial Neural Network (ANN) / Some Types and Applications ● Fully connected multi-layer Deep Neural Networks (DNN) ● Convolutional Neural Network (CNN) ○ Spacial (Image) ○ Context (Text and NLP) ● Recursive Neural Network ○ Sequences (Text letters, Stock events)
  • 6. Artificial Neural Network (ANN) / Some Types and Applications ● Convolutional Neural Network (CNN) ○ Spacial (Image): classification/regression ○ Context (Text and NLP): classification/regression ● Recursive Neural Network ○ Sequences (Text letters, Stock events) ■ Seq2Seq: Translation, summarization, ... ■ Seq2Label ■ Seq2Value
  • 7. ● Massive number of trainable weights to tune ● Massive number Multiply–Accumulate (MAC) operations ● Vanishing/Exploding Gradients Deep Learning / some challenges
  • 8. ● Massive number of trainable weights to tune ● Massive number Multiply–Accumulate (MAC) operations ○ Low throughput (ex. images/second) ● Vanishing/Exploding Gradients ○ Slow to converge Deep Learning / some challenges
  • 9. Input Output Deep Neural Network Millions of Operations Per Item
  • 10. Sample Batch Update Deep Neural Network Given Labels Output A training step: process a batch and update weights
  • 11. “Stochastic Learning” or “Stochastic Gradient Descent” (SGD) is done by taking small random samples (mini-batches) instead of the whole batch of training data “Batch Learning”. Faster to converge and better in handling the noise and non-linearity. That’s why batch learning was considered inefficient[1][2] . 1. Y. LeCun, “Efficient backprop” 2. D. R. Wilson and T. R. Martinez, “The general inefficiency of batch training for gradient descent learning,” Batch Learning vs. Stochastic Learning
  • 12. Sample Update Deep Neural Network Given Labels Output Factors Affecting Convergence Speed Sample Size Design Complexity / Depth / Number of MAC operators # Classes Learning Rate Momentum Opt. Algo.
  • 14. ● Sample size related ● Learning rate related ● Optimization Algorithm Related ● NN design related ● Transforming Input/Output Literature Review
  • 15. ● Sample size related ○ Too big batch-size (8192 images per batch) ○ Increasing batch-size ● Learning rate related ○ Per-dimension ○ Fading ○ Momentum ○ Cyclic ○ Warm restart... ● Optimization Algorithm Related ○ AdaGrad, Adam, AdaDelta, ... Literature Review / see paper ● NN design related ○ SqueezeNet, MobileNet ○ Separable operators ○ Batch-norm ○ Early AUX classifier branches ● Transforming Input/Output ○ Reusing existing model (fine-tuning) ○ Knowledge transfer
  • 17. Do very high risk initializations using extremely small mini-batch size (ex. 4 or 8 samples per batch). Then “Train-Measure-Adapt-Repeat”. As long as it’s getting better results keep using such fast-forwarding settings. When stuck use larger mini-batch size (for example, 32 samples per batch). Proposed Method
  • 18. ff_criteria can be defined with respect to change in evaluation accuracy like this If (acc_new>acc_old) then mode=ff else model=normal
  • 19. ● Specially for cold start (initialization) ● Instead of too big batch-size like 8,192 samples per batch use extremely small mini-batch size like 4 or 8 samples per batch! (as long as hardware is fully utilized) ● The network is too cold, it’s already too bad and you have nothing to lose. Use extremely small mini-batch size
  • 20. Assuming that the hardware is fully utilized and have constant throughput (Images/Seconds), processing a sample of 8 images is 4 times faster than processing a batch of 32 images. Doing 4 times more updates. A good guess for batch size is number of cores in your computer. (scope of paper is training on commodity hardware). Why it ticks faster?
  • 21. By using 4x smaller batch-size, we are doing 4x more higher risk updates. Batch size have linear effect on speed but effect on accuracy is not linear. Don’t look at accuracy by number of steps but look at accuracy over time. It ticks faster but does it converge faster?
  • 22. Experiments: Fine-tuning Inception v1 pre-trained on ImageNet 1K task.
  • 25. Accuracy over steps: accuracy of batch-size=10 (in cyan) is always below others Misleading
  • 26. Accuracy over time: accuracy of batch-size=10 (in cyan) reached 56% in 2 hours, while others were lagging behind at 40%, 28%, and 10%.
  • 27. Experiment: The Oxford-IIIT Pet Dataset (Pets-37)
  • 29. Eval accuracy over time: using mini-batch size of 8 reached 80% accuracy within one hour only.
  • 30. Experiment: Adaptive part on Birds-200 Dataset
  • 31. Eval accuracy over time: reaching ~72% accuracy within ~2:20 hours
  • 33. Summary: Train-Measure-Adapt-Repeat ● Start with very small mini-batch size and large learning rate ○ BatchSize=4; LearningRate=0.1 ● Let mini-batch size be cyclic ○ Switch between two settings (batch size of 8 and 32) ○ Adaptive, non-periodic, based on evaluation accuracy ○ Change the bounds of the settings as you go
  • 34. Q & A
  • 35. Thank you Follow me on Github http://muayyad-alsadi.github.io/