3. 1.1 Definition
• CNN is a specualized kind of neural network for processing
data thtat has a known, grid-like topology, such as time-
series(1D grid), image data(2D grid), etc.
• CNN is a supervised deep learning algorithm, it is used in
various fields like speech recognition, image retrieval and
face recognition.
School of Computer Sicience and Engineering
4. 1.1 Definition
• ImageNet Classification with Deep Convolutional Neural Networks
(Cited by 9538, NIPS 2012, Alex Krizhevsky, Ilya Sutskever, Geoffrey
E. Hinton)
• build a CNN, has 60 million parameters and 650,000 neurons,
consists of five convolutional layers.
• Typical CNN is a 5 layer architecture consist of convolution layer,
pooling layer and classification layer.
• Convolution layer: extract the unique features from the input
image
• Pooling layer: reduce the dimensionality
• Generally CNN is trained using back-propagation algorithm
School of Computer Sicience and Engineering
5. 1.2 Motivation
• MLP do not scale well
• MLP ignore pixel correlation
• MLP are not robust to image transformation
School of Computer Sicience and Engineering
multi-layer perceptron
6. 2.1 Why Convolution ?
• preserves the spatial relationship
between pixels by learning image
features using small squares of
input data
• detect small,meaningful features
such as edges with kernels
School of Computer Sicience and Engineering
A 2D convolution example from deep learning book
10. 3 ReLU
• Introducing the Non Linearity
School of Computer Sicience and Engineering
Other non linear
functions such as
tanh or sigmoid can
also be used instead
of ReLU, but ReLU
has been found to
perform better in
most situations
11. 4.1 Motivation of Pooling
• Reduce dimensionality
• In all cases, pooling helps to make the representation become
approximately invariant to small translations of the input.
• local translation can be a very useful property if we care more about
whether some feature is present than exactly where it is.
• Type of Pooling
• Max(works better)
• Average
• Sum
School of Computer Sicience and Engineering
13. 5 Example by Tensorflow
School of Computer Sicience and Engineering
28 * 28
14. 5 Example by Tensorflow
School of Computer Sicience and Engineering
15. 5 Example by Tensorflow
School of Computer Sicience and Engineering
• zero-padding the 28x28x1 image to
32x32x1
• applying 5x5x32 convolution to get
28x28x32
• max-pooling down to 14x14x32 zero-
padding the 14x14x32 to 18x18x32
• applying 5x5x32x64 convolution to get
14x14x64
• max-pooling down to 7x7x64.
16. 5 Example by Tensorflow
School of Computer Sicience and Engineering
For example, when determining whether an image contains a face, we need not know the location
of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on
the left side of the face and an eye on the right side of the face
For example, when determining whether an image contains a face, we need not know the location
of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on
the left side of the face and an eye on the right side of the face
For example, when determining whether an image contains a face, we need not know the location
of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on
the left side of the face and an eye on the right side of the face
For example, when determining whether an image contains a face, we need not know the location
of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on
the left side of the face and an eye on the right side of the face
For example, when determining whether an image contains a face, we need not know the location
of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on
the left side of the face and an eye on the right side of the face
For example, when determining whether an image contains a face, we need not know the location
of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on
the left side of the face and an eye on the right side of the face
For example, when determining whether an image contains a face, we need not know the location
of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on
the left side of the face and an eye on the right side of the face