SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
Case Study of CNN
from LeNet to ResNet
NamHyuk Ahn @ Ajou Univ.
2016. 03. 09
Convolutional Neural Network
Convolution Layer
- Convolution (3-dim dot product) image and filter
- Stack filter in one layer (See blue and green output,
called channel)
Convolution Layer
- Local Connectivity
• Instead connect all pixels to neurons, connect
only local region of input (called receptive field)
• It can reduce many parameter
- Parameter sharing
• To reduce parameter, each channel have same
filter. (# of filter == # of channel)
Convolution Layer
- Example) 1st conv layer in AlexNet
• Input: [224, 224], filter: [11x11x3], 96, output: [55, 55]
- Each filter extract different features (i.e. horizontal
edge, vertical edge…)
Pooling Layer
- Downsample image to reduce parameter
- Usually use max pooling (take maximum value in
region)
ReLU, FC Layer
- ReLU
• Sort of activation function (e.g. sigmoid, tanh…)
- Fully-connected Layer
• Same as normal neural network
Convolutional Neural Network
Training CNN
1. Calculate loss function with foward-prop
2. Optimize parameter w.r.t loss function with back-
prop
• Use gradient descent method (SGD)
• Gradient of weight can calculate with chain rule of partial derivate
ILSVRC trend
AlexNet (2012)
(ILSVRC 2012 winner)
AlexNet
- ReLU
- Data augmentation
- Dropout
- Ensemble CNN (1-CNN 18.2%, 7-CNN 15.4%)
AlexNet
- Other methods (but will not mention today)
• SGD + momentum (+ mini-batch)
• Multiple GPU
• Weight Decay
• Local Response Normalization
Problems of sigmoid
- Gradient vanishing
• when gradient pass sigmoid, it can vanish
because local gradient of sigmoid can be almost
zero.
- Output is not zero-centered
• cause bad performance
ReLU
- Converge of SGD is faster than sigmoid-like
- Computationally cheap
Data augmentation
- Randomly crop [256, 256] images to [224, 224]
- At test time, crop 5 images and average to predict
Dropout
- Similar to bagging (approximation of bagging)
- Act like regularizer (reduce overfit)
- Instead of using all neurons, “dropout” some neurons
randomly (usually 0.5 probability)
Dropout
• At test time, not “dropout” neurons, but use
weighted neurons (usually 0.5)
• Weight is expected value of each neurons
Architecture
- conv - pool - … - fc - softmax (similar to LeNet)
- Use large size filter (i.e. 11x11)
Architecture
- Weights must be initalized randomly
• If not, all gradients of neurons will be same
• Usually, use gaussian distribution, std = 0.01
- Use mini-batch SGD and momentum SGD to
update weight
VGGNet (2014)
(ILSVRC 2014 2nd)
VGGNet
- Use small size kernel (always 3x3)
• Can use multiple non-linearlity (e.g. ReLU)
• Less weights to train
- Hard data augmentation (more than AlexNet)
- Ensemble 7 model (ILSVRC submission 7.3%)
Architecture
- Most memory needs in early layers, most parameters
increase in fc layers.
GoogLeNet - Inception v1 (2014)
(ILSVRC 2014 winner)
GoogLeNet
Inception module
- Use 1x1, 3x3 and 5x5 conv
simultaneously to capture
variety of structure
- Capture dense structure to
1x1, more spread out structure
to 3x3, 5x5
- Computational expensive
• Use 1x1 conv layer to
reduce dimension (explain
details in later in ResNet)
Auxiliary Classifiers
- Deep network raises concern about effectiveness
of graident in backprop
- Loss of auxiliary is added to total loss (weighted by
0.3), remove at test time
Average Pooling
- Proposed in Network in Network (also used in
GoogLeNet)
- Problems of fc layer
• Needs lots of parameter, easy to overfit
- Replace fc to average pooling
Average Pooling
- Make channel as same as # of class in last conv
- Calc average on each channel, and pass to softmax
- Reduce overfit
MSRA ResNet (2015)
(ILSVRC 2015 winner)
before ResNet..
- Have to know about
• PReLU
• Xavier Initalization
• Batch Normalization
PReLU
- Adaptive version of ReLU
- Train slope of function when x < 0
- Slightly more parameter (# of layer x # of channel)
Xavier Initalization
- If init with gaussian distribution, output of neurons
will be nearly zeros when network is deeep
- If increase std (1.0), output will saturate to -1 or 1
- Xavier init decide initial value by number of input
neurons
- Looks fine, but this init method assume linear
activation so can’t use in ReLU-like network
output is saturated
output is vanished
Xavier Initalization / 2
Xavier Initalization
Xavier Initalization / 2
Batch Normalization
- Make output to be gaussian distribution, but
normalization cost a lot
• Calc mean, variance in each dimension (assume each dims are
uncorrelated)
• Calc mean, variance in mini-batch (not entire set)
- Normalize constrain non-linearlity and constrain
network by assume each dims are uncorrelated
• Linear transform output (factors are parameter)
Batch Normalization
- When test, calc mean, variance using entire set (use
moving average)
- BN act like regularizer (don’t need Dropout)
ResNet
ResNet
Problem of degradation
- More depth, more accurate but deep network can
vanish/explode gradient
• BN, Xavier Init, Dropout can handle (~30 layer)
- More deeper, degradation problem occur
• Not only overfit, but also increase training error
Deep Residual Learning
- Element-wise addition with F(x) and shortcut
connection, and pass through ReLU non-linearlity
- Dim of x, F(x) are unequal (changing of channel),
linear project x to match dim (done by 1x1 conv)
- Similar to LSTM
Deeper Bottleneck
- To reduce training time, modify as bottleneck design
(just for economical reason)
• (3x3x3)x64x64 + (3x3x3)x64x64=221184 (left)
• (1x1x3)x256x64 + (3x3x3)x64x64 + (1x1x3)x64x256=208896 (right)
• More width(channel) in right, but similar parameter
• Similar method also used in GoogLeNet
ResNet
- Data augmentation as AlexNet does
- Batch Normalization (no dropout)
- Xavier / 2 initalization
- Average pooling
- Structure follows VGGNet style
Conclusion
Top-5Error
0%
4%
8%
12%
16%
AlexN
et
(2012)
VG
G
N
et
(2014)
Inception-V1
(2014)
H
um
an
PR
eLU
-net
(2015)
BN
-Inception
(2015)
R
esN
et-152
(2015)
Inception-R
esN
et
(2016)
3.1%
3.57%
4.82%4.94%5.1%
6.66%
7.32%
15.31%
Conclusion
- Dropout, BN
- ReLU-like activation (e.g. PReLU, ELU..)
- Xavier initalization
- Average pooling
- Use pre-trained model :)
Reference
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." Advances in neural information processing systems. 2012.
- Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." arXiv preprint arXiv:1409.1556 (2014).
- Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
- He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet
classification." Proceedings of the IEEE International Conference on Computer Vision. 2015.
- He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385
(2015).
- Szegedy, Christian, Sergey Ioffe, and Vincent Vanhoucke. "Inception-v4, Inception-ResNet and the
Impact of Residual Connections on Learning." arXiv preprint arXiv:1602.07261 (2016).
- Gu, Jiuxiang, et al. "Recent Advances in Convolutional Neural Networks." arXiv preprint arXiv:
1512.07108 (2015). (good for tutorial)
- Also Thanks to CS231n, I used some figures in CS231n lecture slides. 

see http://cs231n.stanford.edu/index.html

Contenu connexe

Tendances

Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 

Tendances (20)

Google net
Google netGoogle net
Google net
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
deep learning
deep learningdeep learning
deep learning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Densely Connected Convolutional Networks
Densely Connected Convolutional NetworksDensely Connected Convolutional Networks
Densely Connected Convolutional Networks
 
Backtracking
BacktrackingBacktracking
Backtracking
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 

Similaire à Case Study of Convolutional Neural Network

nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
2017 (albawi-alkabi)image-net classification with deep convolutional neural n...
2017 (albawi-alkabi)image-net classification with deep convolutional neural n...2017 (albawi-alkabi)image-net classification with deep convolutional neural n...
2017 (albawi-alkabi)image-net classification with deep convolutional neural n...
ali hassan
 

Similaire à Case Study of Convolutional Neural Network (20)

backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
CNN.pptx
CNN.pptxCNN.pptx
CNN.pptx
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Lecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network ModelsLecture 5: Convolutional Neural Network Models
Lecture 5: Convolutional Neural Network Models
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
14_cnn complete.pptx
14_cnn complete.pptx14_cnn complete.pptx
14_cnn complete.pptx
 
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation Controller
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
2017 (albawi-alkabi)image-net classification with deep convolutional neural n...
2017 (albawi-alkabi)image-net classification with deep convolutional neural n...2017 (albawi-alkabi)image-net classification with deep convolutional neural n...
2017 (albawi-alkabi)image-net classification with deep convolutional neural n...
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Resnet
ResnetResnet
Resnet
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Loop parallelization & pipelining
Loop parallelization & pipeliningLoop parallelization & pipelining
Loop parallelization & pipelining
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 

Plus de NamHyuk Ahn (7)

Supporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OSSupporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OS
 
TensorFlow Tutorial
TensorFlow TutorialTensorFlow Tutorial
TensorFlow Tutorial
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
 
Single Shot Multibox Detector
Single Shot Multibox DetectorSingle Shot Multibox Detector
Single Shot Multibox Detector
 
Multimodal Residual Learning for Visual QA
Multimodal Residual Learning for Visual QAMultimodal Residual Learning for Visual QA
Multimodal Residual Learning for Visual QA
 
Google's Multilingual Neural Machine Translation System
Google's Multilingual Neural Machine Translation SystemGoogle's Multilingual Neural Machine Translation System
Google's Multilingual Neural Machine Translation System
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 

Dernier

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 

Case Study of Convolutional Neural Network

  • 1. Case Study of CNN from LeNet to ResNet NamHyuk Ahn @ Ajou Univ. 2016. 03. 09
  • 3.
  • 4.
  • 5. Convolution Layer - Convolution (3-dim dot product) image and filter - Stack filter in one layer (See blue and green output, called channel)
  • 6. Convolution Layer - Local Connectivity • Instead connect all pixels to neurons, connect only local region of input (called receptive field) • It can reduce many parameter - Parameter sharing • To reduce parameter, each channel have same filter. (# of filter == # of channel)
  • 7. Convolution Layer - Example) 1st conv layer in AlexNet • Input: [224, 224], filter: [11x11x3], 96, output: [55, 55] - Each filter extract different features (i.e. horizontal edge, vertical edge…)
  • 8. Pooling Layer - Downsample image to reduce parameter - Usually use max pooling (take maximum value in region)
  • 9. ReLU, FC Layer - ReLU • Sort of activation function (e.g. sigmoid, tanh…) - Fully-connected Layer • Same as normal neural network
  • 11. Training CNN 1. Calculate loss function with foward-prop 2. Optimize parameter w.r.t loss function with back- prop • Use gradient descent method (SGD) • Gradient of weight can calculate with chain rule of partial derivate
  • 12.
  • 13.
  • 14.
  • 16.
  • 18. AlexNet - ReLU - Data augmentation - Dropout - Ensemble CNN (1-CNN 18.2%, 7-CNN 15.4%)
  • 19. AlexNet - Other methods (but will not mention today) • SGD + momentum (+ mini-batch) • Multiple GPU • Weight Decay • Local Response Normalization
  • 20. Problems of sigmoid - Gradient vanishing • when gradient pass sigmoid, it can vanish because local gradient of sigmoid can be almost zero. - Output is not zero-centered • cause bad performance
  • 21. ReLU - Converge of SGD is faster than sigmoid-like - Computationally cheap
  • 22. Data augmentation - Randomly crop [256, 256] images to [224, 224] - At test time, crop 5 images and average to predict
  • 23. Dropout - Similar to bagging (approximation of bagging) - Act like regularizer (reduce overfit) - Instead of using all neurons, “dropout” some neurons randomly (usually 0.5 probability)
  • 24. Dropout • At test time, not “dropout” neurons, but use weighted neurons (usually 0.5) • Weight is expected value of each neurons
  • 25. Architecture - conv - pool - … - fc - softmax (similar to LeNet) - Use large size filter (i.e. 11x11)
  • 26. Architecture - Weights must be initalized randomly • If not, all gradients of neurons will be same • Usually, use gaussian distribution, std = 0.01 - Use mini-batch SGD and momentum SGD to update weight
  • 28. VGGNet - Use small size kernel (always 3x3) • Can use multiple non-linearlity (e.g. ReLU) • Less weights to train - Hard data augmentation (more than AlexNet) - Ensemble 7 model (ILSVRC submission 7.3%)
  • 29. Architecture - Most memory needs in early layers, most parameters increase in fc layers.
  • 30. GoogLeNet - Inception v1 (2014) (ILSVRC 2014 winner)
  • 32. Inception module - Use 1x1, 3x3 and 5x5 conv simultaneously to capture variety of structure - Capture dense structure to 1x1, more spread out structure to 3x3, 5x5 - Computational expensive • Use 1x1 conv layer to reduce dimension (explain details in later in ResNet)
  • 33. Auxiliary Classifiers - Deep network raises concern about effectiveness of graident in backprop - Loss of auxiliary is added to total loss (weighted by 0.3), remove at test time
  • 34. Average Pooling - Proposed in Network in Network (also used in GoogLeNet) - Problems of fc layer • Needs lots of parameter, easy to overfit - Replace fc to average pooling
  • 35. Average Pooling - Make channel as same as # of class in last conv - Calc average on each channel, and pass to softmax - Reduce overfit
  • 37. before ResNet.. - Have to know about • PReLU • Xavier Initalization • Batch Normalization
  • 38. PReLU - Adaptive version of ReLU - Train slope of function when x < 0 - Slightly more parameter (# of layer x # of channel)
  • 39. Xavier Initalization - If init with gaussian distribution, output of neurons will be nearly zeros when network is deeep - If increase std (1.0), output will saturate to -1 or 1 - Xavier init decide initial value by number of input neurons - Looks fine, but this init method assume linear activation so can’t use in ReLU-like network
  • 41. Xavier Initalization / 2 Xavier Initalization Xavier Initalization / 2
  • 42. Batch Normalization - Make output to be gaussian distribution, but normalization cost a lot • Calc mean, variance in each dimension (assume each dims are uncorrelated) • Calc mean, variance in mini-batch (not entire set) - Normalize constrain non-linearlity and constrain network by assume each dims are uncorrelated • Linear transform output (factors are parameter)
  • 43. Batch Normalization - When test, calc mean, variance using entire set (use moving average) - BN act like regularizer (don’t need Dropout)
  • 46. Problem of degradation - More depth, more accurate but deep network can vanish/explode gradient • BN, Xavier Init, Dropout can handle (~30 layer) - More deeper, degradation problem occur • Not only overfit, but also increase training error
  • 47. Deep Residual Learning - Element-wise addition with F(x) and shortcut connection, and pass through ReLU non-linearlity - Dim of x, F(x) are unequal (changing of channel), linear project x to match dim (done by 1x1 conv) - Similar to LSTM
  • 48. Deeper Bottleneck - To reduce training time, modify as bottleneck design (just for economical reason) • (3x3x3)x64x64 + (3x3x3)x64x64=221184 (left) • (1x1x3)x256x64 + (3x3x3)x64x64 + (1x1x3)x64x256=208896 (right) • More width(channel) in right, but similar parameter • Similar method also used in GoogLeNet
  • 49. ResNet - Data augmentation as AlexNet does - Batch Normalization (no dropout) - Xavier / 2 initalization - Average pooling - Structure follows VGGNet style
  • 52. Conclusion - Dropout, BN - ReLU-like activation (e.g. PReLU, ELU..) - Xavier initalization - Average pooling - Use pre-trained model :)
  • 53. Reference - Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. - Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). - Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013). - He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE International Conference on Computer Vision. 2015. - He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015). - Szegedy, Christian, Sergey Ioffe, and Vincent Vanhoucke. "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning." arXiv preprint arXiv:1602.07261 (2016). - Gu, Jiuxiang, et al. "Recent Advances in Convolutional Neural Networks." arXiv preprint arXiv: 1512.07108 (2015). (good for tutorial) - Also Thanks to CS231n, I used some figures in CS231n lecture slides. 
 see http://cs231n.stanford.edu/index.html