Efficient Backpropagation

•Télécharger en tant que PPTX, PDF•

0 j'aime•613 vues

Presentation on research paper on efficient backpropagation. How to improve training time for neural networks. Tips and tricks on improving learning efficiency of networks.

Données & analyses

CS380/CS580 Artificial Intelligence for Games
EFFICIENT BACKPROP
Yann LeCun
Leon Bottou
Genevieve B Orr
Klaus-Robert Miller

EFFICIENT
BACKPROP
A. Introduction
B. Few Practical Tricks
C. Multiple Dimension Gradient
D. Optimization Methods
Overview

Making neural network work is more of an art than
science
Choices:-
Number of nodes
Number of layers
Activation function
Learning rate
And so on
▐ Introduction

Trick 1: Stochastic versus Batch learning
Stochastic Batch
▐ A Few Practical Tricks

Trick 2: Shuffling the Examples
▐ A Few Practical Tricks

Trick 3: Normalizing the Inputs
▐ A Few Practical Tricks

Trick 4: The Sigmoid
▐ A Few Practical Tricks
Trick:

Trick 5: Initializing the Weights
Weights Very Large/Small Small Gradient Slow Learning
Weights should be in the range of linear region of sigmod
Advantage:
(1) Gradients will be large enough
(2) Easier to learn linear part for network
Trick: Initializing Weights where, m= No. of Input values
▐ A Few Practical Tricks

Trick 6: Choosing Learning Rates
Approach 1 : Adjusting Learning rate depending on the weight vector.
Problem: Cannot be applied to Stochastic or Online learning methods.
Approach 2: Maintain different learning rates for each element of weight vector.
Calculate 2nd Derivative
Make sure that all weights converge at the same speed.
Trick: Learning rates should be proportional to the square root of connections
sharing that weight.
▐ A Few Practical Tricks

Learning rate affects the convergence
▐ Single Dimension Gradient

Taylor Series:-
▐ Single Dimension Gradient

▐ Multiple Dimension Gradient
Hessian: Measure of curvature of E in multiple dimension.

▐ Second Order Optimization Methods
Newton Algorithm
Whitening Transform well known
in signal processing can convert
ellipsoidal to spherical shape

▐ Second Order Optimization Methods
Conjugate Gradient
Minimize the gradient along a line.
(1) Does not use Hessian explicitly
(2)It is O(N) Method
(3)Works only for batch training
(4)Gradient doesn’t change the
direction but only it’s length

CS380/CS580 Artificial Intelligence for Games
Thank you
Questions/Comments

Contenu connexe

Similaire à Efficient Backpropagation

Temporal difference learningJie-Han Chen

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya

Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi

Deep Learning for Computer Vision: Optimization (UPC 2016)Universitat Politècnica de Catalunya

Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Deeplearning Nimrita Koul

08 neural networksankit_ppt

Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Cheatsheet deep-learning-tips-tricksSteve Nouri

An overview of gradient descent optimization algorithms.pdfvudinhphuong96

Linear Regression.pptxssuser2624f71

Quasi newton artificial neural network training algorithmsMrinmoy Majumder

Neural Network Part-2Venkata Reddy Konasani

Deep LearningPawan Singh

Techniques in Deep LearningSourya Dey

Setting Artificial Neural Networks parametersMadhumita Tamhane

Deep learning from scratch Eran Shlomo

deep CNN vs conventional MLChao Han chaohan@vt.edu

Paper review: Learned Optimizers that Scale and Generalize.Wuhyun Rico Shin

Similaire à Efficient Backpropagation (20)

Temporal difference learning

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...

Bag of tricks for image classification with convolutional neural networks r...

Deep Learning for Computer Vision: Optimization (UPC 2016)

Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)

Deeplearning

08 neural networks

Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)

Cheatsheet deep-learning-tips-tricks

An overview of gradient descent optimization algorithms.pdf

Linear Regression.pptx

Quasi newton artificial neural network training algorithms

Neural Network Part-2

Deep Learning

Techniques in Deep Learning

Setting Artificial Neural Networks parameters

Deep learning from scratch

deep CNN vs conventional ML

Paper review: Learned Optimizers that Scale and Generalize.

Plus de Aakash Chotrani

What is goap, and why is it not already mainstreamAakash Chotrani

Deep q learning with lunar landerAakash Chotrani

Reinforcement LearningAakash Chotrani

Course recommender systemAakash Chotrani

Artificial Intelligence in gamesAakash Chotrani

Simple & Fast FluidsAakash Chotrani

Supervised Unsupervised and Reinforcement Learning Aakash Chotrani

Plus de Aakash Chotrani (7)

What is goap, and why is it not already mainstream

Deep q learning with lunar lander

Reinforcement Learning

Course recommender system

Artificial Intelligence in games

Simple & Fast Fluids

Supervised Unsupervised and Reinforcement Learning

Dernier

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

ASML's Taxonomy Adventure by Daniel Cantervoginip

Learn How Data Science Changes Our WorldEduminds Learning

Easter Eggs From Star Wars and in cars 1 and 217djon017

How we prevented account sharing with MFAAndrei Kaleshka

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort

Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

Dernier (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

ASML's Taxonomy Adventure by Daniel Canter

Learn How Data Science Changes Our World

Easter Eggs From Star Wars and in cars 1 and 2

How we prevented account sharing with MFA

GA4 Without Cookies [Measure Camp AMS]

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

Defining Constituents, Data Vizzes and Telling a Data Story

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

Top 5 Best Data Analytics Courses In Queens

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi

Student profile product demonstration on grades, ability, well-being and mind...

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

Efficient Backpropagation

1. CS380/CS580 Artificial Intelligence for Games EFFICIENT BACKPROP Yann LeCun Leon Bottou Genevieve B Orr Klaus-Robert Miller

2. EFFICIENT BACKPROP A. Introduction B. Few Practical Tricks C. Multiple Dimension Gradient D. Optimization Methods Overview

3. Making neural network work is more of an art than science Choices:- Number of nodes Number of layers Activation function Learning rate And so on ▐ Introduction

4. Trick 1: Stochastic versus Batch learning Stochastic Batch ▐ A Few Practical Tricks

5. Trick 2: Shuffling the Examples ▐ A Few Practical Tricks

6. Trick 3: Normalizing the Inputs ▐ A Few Practical Tricks

7. Trick 4: The Sigmoid ▐ A Few Practical Tricks Trick:

8. Trick 5: Initializing the Weights Weights Very Large/Small Small Gradient Slow Learning Weights should be in the range of linear region of sigmod Advantage: (1) Gradients will be large enough (2) Easier to learn linear part for network Trick: Initializing Weights where, m= No. of Input values ▐ A Few Practical Tricks

9. Trick 6: Choosing Learning Rates Approach 1 : Adjusting Learning rate depending on the weight vector. Problem: Cannot be applied to Stochastic or Online learning methods. Approach 2: Maintain different learning rates for each element of weight vector. Calculate 2nd Derivative Make sure that all weights converge at the same speed. Trick: Learning rates should be proportional to the square root of connections sharing that weight. ▐ A Few Practical Tricks

10. Trick 6: Choosing Learning Rates Approach 1 : Adjusting Learning rate depending on the weight vector. Problem: Cannot be applied to Stochastic or Online learning methods. Approach 2: Maintain different learning rates for each element of weight vector. Calculate 2nd Derivative Make sure that all weights converge at the same speed. Trick: Learning rates should be proportional to the square root of connections sharing that weight. ▐ A Few Practical Tricks

11. Learning rate affects the convergence ▐ Single Dimension Gradient

12. Taylor Series:- ▐ Single Dimension Gradient

13. ▐ Single Dimension Gradient

14. ▐ Multiple Dimension Gradient Hessian: Measure of curvature of E in multiple dimension.

15. ▐ Multiple Dimension Gradient Hessian: Measure of curvature of E in multiple dimension.

16. ▐ Second Order Optimization Methods Newton Algorithm Whitening Transform well known in signal processing can convert ellipsoidal to spherical shape

17. ▐ Second Order Optimization Methods Conjugate Gradient Minimize the gradient along a line. (1) Does not use Hessian explicitly (2)It is O(N) Method (3)Works only for batch training (4)Gradient doesn’t change the direction but only it’s length

18. CS380/CS580 Artificial Intelligence for Games Thank you Questions/Comments

Efficient Backpropagation

Recommandé

Recommandé

Contenu connexe

Similaire à Efficient Backpropagation

Similaire à Efficient Backpropagation (20)

Plus de Aakash Chotrani

Plus de Aakash Chotrani (7)

Dernier

Dernier (20)

Efficient Backpropagation