Presentation on research paper on efficient backpropagation. How to improve training time for neural networks. Tips and tricks on improving learning efficiency of networks.
3. Making neural network work is more of an art than
science
Choices:-
Number of nodes
Number of layers
Activation function
Learning rate
And so on
▐ Introduction
4. Trick 1: Stochastic versus Batch learning
Stochastic Batch
▐ A Few Practical Tricks
7. Trick 4: The Sigmoid
▐ A Few Practical Tricks
Trick:
8. Trick 5: Initializing the Weights
Weights Very Large/Small Small Gradient Slow Learning
Weights should be in the range of linear region of sigmod
Advantage:
(1) Gradients will be large enough
(2) Easier to learn linear part for network
Trick: Initializing Weights where, m= No. of Input values
▐ A Few Practical Tricks
9. Trick 6: Choosing Learning Rates
Approach 1 : Adjusting Learning rate depending on the weight vector.
Problem: Cannot be applied to Stochastic or Online learning methods.
Approach 2: Maintain different learning rates for each element of weight vector.
Calculate 2nd Derivative
Make sure that all weights converge at the same speed.
Trick: Learning rates should be proportional to the square root of connections
sharing that weight.
▐ A Few Practical Tricks
10. Trick 6: Choosing Learning Rates
Approach 1 : Adjusting Learning rate depending on the weight vector.
Problem: Cannot be applied to Stochastic or Online learning methods.
Approach 2: Maintain different learning rates for each element of weight vector.
Calculate 2nd Derivative
Make sure that all weights converge at the same speed.
Trick: Learning rates should be proportional to the square root of connections
sharing that weight.
▐ A Few Practical Tricks
14. ▐ Multiple Dimension Gradient
Hessian: Measure of curvature of E in multiple dimension.
15. ▐ Multiple Dimension Gradient
Hessian: Measure of curvature of E in multiple dimension.
16. ▐ Second Order Optimization Methods
Newton Algorithm
Whitening Transform well known
in signal processing can convert
ellipsoidal to spherical shape
17. ▐ Second Order Optimization Methods
Conjugate Gradient
Minimize the gradient along a line.
(1) Does not use Hessian explicitly
(2)It is O(N) Method
(3)Works only for batch training
(4)Gradient doesn’t change the
direction but only it’s length