SlideShare une entreprise Scribd logo
1  sur  16
Activation Functions
Why to use activation in NN?
𝑦 = 𝑥𝑇𝑊1 + 𝑏
𝑊1
𝑊1 =
𝑤11 𝑤12 𝑤13
𝑤21 𝑤22 𝑤23
𝑤31 𝑤32 𝑤33
Why to use activation in NN?
ℎ1 = 𝑥𝑇𝑊1 + 𝑏
𝑊1
𝑦 = ℎ1
𝑇
𝑊2 + 𝑏
Input ℎ1 Output
𝑦 = (𝑥𝑇
𝑊1
+ 𝑏)𝑊2
+ 𝑏
𝑊2
𝑦 = 𝑥𝑇𝑊1𝑊2 + 𝑏𝑊2 + 𝑏
𝑦 = 𝑥𝑇𝑊 + 𝑐
Demo
Sigmoid
𝜎 𝑧 =
1
1 + 𝑒−𝑧
• Squash the real values in the range [0,1]
• ⅆ𝜎 𝑧 = 𝜎 𝑧 (1 − 𝜎 𝑧 )
Sigmoid
• Squash the real values in the range [0,1]
• ⅆ𝜎 𝑧 = 𝜎 𝑧 (1 − 𝜎 𝑧 )
•
𝜕𝑦
𝜕𝑧3
=
𝜕𝑎3
𝜕𝑧3
=
𝜕𝜎(𝑧3)
𝜕𝑧3
= 𝜎(𝑧3)(1 − 𝜎(𝑧3))
x
𝜎
𝜎
𝜎
𝑎1
𝑧1
𝑎2
𝑧2
𝑎3
𝑧3
If a sigmoid neuron is saturated.
𝜎(𝑧3) = 0 and 𝜎(𝑧3) = 1
Then
𝜕𝜎(𝑧3)/𝜕𝑧3 = 0
Remember the update Rule
𝑤 = 𝑤 − 𝜂
𝜕ℒ
𝜕𝑧
Why does a sigmoid neuron saturate?
𝑦𝑖 = 𝜎 𝑧
𝑧 = 𝑥𝑖𝑤𝑖
Problem with sigmoid
• Gradient Vanishes in case of neuron saturate
• Expensive to compute.
• Not zero centered.
𝑦 = 𝑎1𝑤1 + 𝑎2𝑤2
∇𝑤1 =
𝜕ℒ
𝜕𝑦
𝜕𝑦
𝜕𝑧3
𝜕𝑧3
𝜕𝑤1
∇𝑤2 =
𝜕ℒ
𝜕𝑦
𝜕𝑦
𝜕𝑧3
𝜕𝑧3
𝜕𝑤2
𝑤1
𝑤2
𝑥1
𝑥2
𝜎(𝑧1)
𝜎(𝑧2)
𝜎(𝑧3)
𝑦 = 𝜎(𝑎1𝑤1 + 𝑎2𝑤2)
Problem with sigmoid
𝑦 = 𝑎1𝑤1 + 𝑎2𝑤2
∇𝑤1 =
𝜕ℒ
𝜕𝑦
𝜕𝑦
𝜕𝑧3
𝑎1
∇𝑤2 =
𝜕ℒ
𝜕𝑦
𝜕𝑦
𝜕𝑧3
𝑎2
𝜕ℒ
𝜕𝑦
𝜕𝑦
𝜕𝑧3
, 𝑎1 ≥ 0
-VE or +VE
So Gradient of loss w. r.t all weights at a particular hidden layer are either +ve or –ve.
This restrict the possible update direction .
TanH
• Squash the real value into [-1,1]
• Zero Centered
•
𝜕 tanh 𝑥
𝜕𝑥
= 1 − 𝑡𝑎𝑛ℎ2 𝑥
• Gradient Vanishing problem
• Expensive to compute
Relu
𝑟𝑒𝑙𝑢 𝑥 = max(0, 𝑥)
• Gradient does not saturate
• Computationally Efficient
• Converge faster then 𝑆𝑖𝑔𝑚𝑜𝑖ⅆ 𝑎𝑛ⅆ 𝑡𝑎𝑛ℎ
• Not zero centered
• Dead Neuron
Dead Neuron
𝜕𝑟𝑒𝑙𝑢(𝑥)
𝜕𝑥
=
1 𝑖𝑓 𝑥 > 0
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑖𝑓 𝑥1𝑤1 + 𝑥2𝑤2 + 𝑏 < 0 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑏 < < 0
A Neuron is said to be dead if its weight are not
updated during the training
∇𝑤1 =
𝜕ℒ
𝜕𝑦
𝜕𝑦
𝜕𝑧1
𝜕𝑧1
𝜕𝑤1
= 0 𝑏𝑒𝑐𝑎𝑢𝑠𝑒
𝜕𝑦
𝜕𝑧1
=0
Remember: Large number of relu neuron dies during the
training if learning rate to too high.
1
𝑥1
𝑥2
𝑧1
𝑦 = 𝜎 𝑧1 . 𝑤ℎ𝑒𝑟𝑒 𝑧1 = 𝑥1𝑤1 + 𝑥2𝑤2+b
𝑤1
𝑤2
b
Leaky Relu
𝑙𝑒𝑎𝑘𝑦 𝑟𝑒𝑙𝑢 𝑥 = max(0.1 ∗ 𝑥, 𝑥)
• No saturation
• No neuron dies problem
• Computationally efficient
Parametric Relu
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝑟𝑒𝑙𝑢 𝑥 = max(𝛼𝑥, 𝑥)
All benefits of Leaky relu
Exponential Relu
𝐸𝐿𝑈 𝑥 = 𝑓 𝑥 =
𝑥, 𝑥 > 0
𝑎𝑒𝑥, −1 𝑥 ≤ 0
All benefits of Leaky relu
but expensive to compute
Maxout activation
𝑚𝑎𝑥𝑜𝑢𝑡 𝑥 = max(𝑥𝑇
𝑤1 + 𝑏1, 𝑥𝑇
𝑤2 + 𝑏2) • No Saturation
• No Dead neuron problem
• Increase the number of parameters
Conclusion
• Sigmoid and tanh suffer from the gradient vanishing problem(don’t
use in case of deep network)
• Relu is the most popular activation but it can lead to Dead neuron
problem.
• Variants of relu requires careful tuning of the hyperparameters.

Contenu connexe

Similaire à Activation Functions.pptx

Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)
Thatchaphol Saranurak
 

Similaire à Activation Functions.pptx (20)

DNN.pptx
DNN.pptxDNN.pptx
DNN.pptx
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
Introduction to neural networks
Introduction to neural networks Introduction to neural networks
Introduction to neural networks
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 
Computational Dynamics edited
Computational Dynamics editedComputational Dynamics edited
Computational Dynamics edited
 
Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2week
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
 
Back propagation
Back propagationBack propagation
Back propagation
 
DeepLearning.pdf
DeepLearning.pdfDeepLearning.pdf
DeepLearning.pdf
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
 
Neural Network Back Propagation Algorithm
Neural Network Back Propagation AlgorithmNeural Network Back Propagation Algorithm
Neural Network Back Propagation Algorithm
 
lec39.ppt
lec39.pptlec39.ppt
lec39.ppt
 
Page rank - from theory to application
Page rank - from theory to applicationPage rank - from theory to application
Page rank - from theory to application
 
Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
 
Scalars and Vectors Part4
Scalars and Vectors Part4Scalars and Vectors Part4
Scalars and Vectors Part4
 

Dernier

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 

Activation Functions.pptx

  • 2. Why to use activation in NN? 𝑦 = 𝑥𝑇𝑊1 + 𝑏 𝑊1 𝑊1 = 𝑤11 𝑤12 𝑤13 𝑤21 𝑤22 𝑤23 𝑤31 𝑤32 𝑤33
  • 3. Why to use activation in NN? ℎ1 = 𝑥𝑇𝑊1 + 𝑏 𝑊1 𝑦 = ℎ1 𝑇 𝑊2 + 𝑏 Input ℎ1 Output 𝑦 = (𝑥𝑇 𝑊1 + 𝑏)𝑊2 + 𝑏 𝑊2 𝑦 = 𝑥𝑇𝑊1𝑊2 + 𝑏𝑊2 + 𝑏 𝑦 = 𝑥𝑇𝑊 + 𝑐 Demo
  • 4. Sigmoid 𝜎 𝑧 = 1 1 + 𝑒−𝑧 • Squash the real values in the range [0,1] • ⅆ𝜎 𝑧 = 𝜎 𝑧 (1 − 𝜎 𝑧 )
  • 5. Sigmoid • Squash the real values in the range [0,1] • ⅆ𝜎 𝑧 = 𝜎 𝑧 (1 − 𝜎 𝑧 ) • 𝜕𝑦 𝜕𝑧3 = 𝜕𝑎3 𝜕𝑧3 = 𝜕𝜎(𝑧3) 𝜕𝑧3 = 𝜎(𝑧3)(1 − 𝜎(𝑧3)) x 𝜎 𝜎 𝜎 𝑎1 𝑧1 𝑎2 𝑧2 𝑎3 𝑧3 If a sigmoid neuron is saturated. 𝜎(𝑧3) = 0 and 𝜎(𝑧3) = 1 Then 𝜕𝜎(𝑧3)/𝜕𝑧3 = 0 Remember the update Rule 𝑤 = 𝑤 − 𝜂 𝜕ℒ 𝜕𝑧
  • 6. Why does a sigmoid neuron saturate? 𝑦𝑖 = 𝜎 𝑧 𝑧 = 𝑥𝑖𝑤𝑖
  • 7. Problem with sigmoid • Gradient Vanishes in case of neuron saturate • Expensive to compute. • Not zero centered. 𝑦 = 𝑎1𝑤1 + 𝑎2𝑤2 ∇𝑤1 = 𝜕ℒ 𝜕𝑦 𝜕𝑦 𝜕𝑧3 𝜕𝑧3 𝜕𝑤1 ∇𝑤2 = 𝜕ℒ 𝜕𝑦 𝜕𝑦 𝜕𝑧3 𝜕𝑧3 𝜕𝑤2 𝑤1 𝑤2 𝑥1 𝑥2 𝜎(𝑧1) 𝜎(𝑧2) 𝜎(𝑧3) 𝑦 = 𝜎(𝑎1𝑤1 + 𝑎2𝑤2)
  • 8. Problem with sigmoid 𝑦 = 𝑎1𝑤1 + 𝑎2𝑤2 ∇𝑤1 = 𝜕ℒ 𝜕𝑦 𝜕𝑦 𝜕𝑧3 𝑎1 ∇𝑤2 = 𝜕ℒ 𝜕𝑦 𝜕𝑦 𝜕𝑧3 𝑎2 𝜕ℒ 𝜕𝑦 𝜕𝑦 𝜕𝑧3 , 𝑎1 ≥ 0 -VE or +VE So Gradient of loss w. r.t all weights at a particular hidden layer are either +ve or –ve. This restrict the possible update direction .
  • 9. TanH • Squash the real value into [-1,1] • Zero Centered • 𝜕 tanh 𝑥 𝜕𝑥 = 1 − 𝑡𝑎𝑛ℎ2 𝑥 • Gradient Vanishing problem • Expensive to compute
  • 10. Relu 𝑟𝑒𝑙𝑢 𝑥 = max(0, 𝑥) • Gradient does not saturate • Computationally Efficient • Converge faster then 𝑆𝑖𝑔𝑚𝑜𝑖ⅆ 𝑎𝑛ⅆ 𝑡𝑎𝑛ℎ • Not zero centered • Dead Neuron
  • 11. Dead Neuron 𝜕𝑟𝑒𝑙𝑢(𝑥) 𝜕𝑥 = 1 𝑖𝑓 𝑥 > 0 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑖𝑓 𝑥1𝑤1 + 𝑥2𝑤2 + 𝑏 < 0 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑏 < < 0 A Neuron is said to be dead if its weight are not updated during the training ∇𝑤1 = 𝜕ℒ 𝜕𝑦 𝜕𝑦 𝜕𝑧1 𝜕𝑧1 𝜕𝑤1 = 0 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝜕𝑦 𝜕𝑧1 =0 Remember: Large number of relu neuron dies during the training if learning rate to too high. 1 𝑥1 𝑥2 𝑧1 𝑦 = 𝜎 𝑧1 . 𝑤ℎ𝑒𝑟𝑒 𝑧1 = 𝑥1𝑤1 + 𝑥2𝑤2+b 𝑤1 𝑤2 b
  • 12. Leaky Relu 𝑙𝑒𝑎𝑘𝑦 𝑟𝑒𝑙𝑢 𝑥 = max(0.1 ∗ 𝑥, 𝑥) • No saturation • No neuron dies problem • Computationally efficient
  • 13. Parametric Relu 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝑟𝑒𝑙𝑢 𝑥 = max(𝛼𝑥, 𝑥) All benefits of Leaky relu
  • 14. Exponential Relu 𝐸𝐿𝑈 𝑥 = 𝑓 𝑥 = 𝑥, 𝑥 > 0 𝑎𝑒𝑥, −1 𝑥 ≤ 0 All benefits of Leaky relu but expensive to compute
  • 15. Maxout activation 𝑚𝑎𝑥𝑜𝑢𝑡 𝑥 = max(𝑥𝑇 𝑤1 + 𝑏1, 𝑥𝑇 𝑤2 + 𝑏2) • No Saturation • No Dead neuron problem • Increase the number of parameters
  • 16. Conclusion • Sigmoid and tanh suffer from the gradient vanishing problem(don’t use in case of deep network) • Relu is the most popular activation but it can lead to Dead neuron problem. • Variants of relu requires careful tuning of the hyperparameters.