2. Why CNN??
The Convolutional Neural Network (CNN or
ConvNet) is a subtype of Neural Networks that
is mainly used for applications in image and
speech recognition.
Its built-in convolutional layer reduces the high
dimensionality of images without losing its
information. That is why CNNs are especially
suited for this use case.
6. VGG Net
● It is a classical CNN architecture developed to increase the depth of CNN’s to
increase the models performance.
● VGG stands for Visual Geometry Group is a deep CNN model with multiple
layers,has about 16 or 19 layers.
● It is a convolutional neural network model proposed by A. Zisserman and K.
Simonyan from the University of Oxford
● The VGG16 model achieves almost 92.7% top-5 test accuracy in
ImageNet.(ImageNet is a dataset consisting of more than 14 million images
belonging to nearly 1000 classes.
7. VGG Architecture:
● It has 16 layers(13 are convolutional layers
and 3 fully connected layers).
● Input: takes an image input size of 224 by
224.
● Convolutional Layers: uses a 3 by 3 filter
and a stride size of 1 and it is followed by
RELU unit which is rectified linear unit
activation function.
● Hidden Layers: all the hidden layers in
VGG use RELU.
● Fully Connected Layers: There are 3 fully
connected layers,the first two have 4096
channels each, and the third has 1000
channels, 1 for each class.
8. ● There are a few convolution layers in this architecture
followed by a pooling layer that reduces the height and the
width (reducing volume).
● VGG16 focuses on 3x3 filter convolution layers with stride 1
and always utilizes the same padding and MaxPool layer of a
2x2 filter with stride 2.
● If we look at the number of filters that we can use, around 64
filters are available that we can double to about 128 and then
to 256 filters. In the last layers, we can use 512 filters.
9. VGG 19:
● VGG19 model (also VGGNet-19) is the same as
the VGG16 except that it has 19 layers.
● The “16” and “19” stand for the number of weight
layers in the model (convolutional layers).
● This means that VGG19 has three more
convolutional layers than VGG16.
10. VGG 16 vs VGG 19
VGG 16 VGG 19
● 16 Layers ● 19 Layers
● Has less Weight. ● Has more Weight
● The size of the “VGG-16” network in
terms of fully connected nodes is 533
MB
● The size of the “VGG-16” network in
terms of fully connected nodes is 574 MB
11. Advantage of VGG 19 over VGG 16
● The main advantage of VGG19 over VGG16 is that it has more layers, which enables it to learn more
complex representations of the data.
● VGG19 is more accurate than VGG16
In Conclusion:
● VGG16 and VGG19 are both convolutional neural networks developed by the Visual Geometry Group (VGG)
at the University of Oxford, both are trained for image classification tasks.
● The main difference between them is the number of layers, VGG16 is a 16-layer CNN, while VGG19 is a 19-
layer CNN,
● VGG19 is more accurate than VGG16
12. DATA
AUGMENTATION
Artificially increasing the training
set by creating modified copies of
a dataset using existing data.
Includes making minor changes
to the dataset or using deep
learning to generate new data
points.
13. AUGMENTED DATA
It is driven from original data with
some minor changes to increase the
size and diversity of the training
set.
It is generated artificially without
using the original dataset. It often
uses DNNs (Deep Neural
Networks) and GANs (Generative
Adversarial Networks) to
generate synthetic data.
SYNTHETIC DATA
14. WHY SHOULD WE USE DATA AUGMENTATION ??
➢ To prevent models from overfitting.
➢ The initial training set is too small.
➢ To improve the model accuracy.
➢ To Reduce the operational cost of labeling and cleaning the raw dataset.
➢ Increases generalization ability of the models.
➢ Helps to resolve class imbalance issues in classification.
15. LIMITATIONS OF DATA AUGMENTATION
➢ The biases in the original dataset persist in the augmented data.
➢ Quality assurance for data augmentation is expensive.
➢ Research and development are required to build a system with advanced
applications. For example, generating high-resolution images using
GANs can be challenging.
➢ Finding an effective data augmentation approach can be challenging.
17. AUDIO DATA
AUGMENTATION
➢ Noise injection: add
gaussian or random noise
➢ Shifting: shift audio left (fast
forward) or right with
random seconds.
➢ Changing the speed:
stretches times series by a
fixed rate.
➢ Changing the pitch:
randomly change the pitch of
the audio.
18. TEXT DATA
AUGMENTATION
➢ Word or sentence shuffling
➢ Word replacement
➢ Syntax-tree manipulation
➢ Random word insertion
➢ Random word deletion
19. IMAGE AUGMENTATION
➢ Geometric transformations : randomly flip, crop, rotate, stretch,
and zoom images.
➢ Color space transformations : randomly change RGB color
channels, contrast, saturation and brightness.
➢ Kernel filters: randomly change the sharpness or blurring of the
image.
➢ Random erasing: delete some part of the initial image.
➢ Mixing images: blending and mixing multiple images.
21. Adversarial Training
based Augmentation
The objective is to transform the
images to deceive a deep-learning
model.
The method learns to generate
masks which when applied to the
input image, generated different
augmented images.
22. GAN based
Augmentation Synthesize images for data
augmentation
Generator is to generate fake
images from the latent space and
the goal of the discriminator is to
distinguish the synthetic fake
images from the real images
23. Neural Style Transfer
based Augmentation Deep Neural Networks are
trained to extract the content(high
level features) from one image
and style(low level features) from
another image and compose the
augmented image using the
extracted content and style.
24. Data Augmentation
in Medical field
Points to remember
➢ Image quality
➢ Tumor location and size
➢ Class imbalance
➢ Validation and evaluation
27. 2. Data Preparation and Preprocessing
➢ Convert the image to grayscale, and blur it slightly
➢ Threshold the image, then perform a series of erosions and dilations to
remove any small regions of noise
➢ Crop new image out of the original image using the four extreme points
(left, right, top, bottom)
30. 3. Load the data
➢ Read the image.
➢ Crop the part of the image representing only the brain.
➢ Resize the image
➢ Apply normalization because we want pixel values to be scaled to the
range 0-1.
➢ Append the image to X and its label to y
➢ Shuffle X and y
33. 4. Split the data
Split X and y into training, validation (development) and validation sets.
➢ 70% of the data for training.
➢ 15% of the data for validation.
➢ 15% of the data for testing.
34. 5. Build the model
➢ Load the VGG16 model, pretrained on ImageNet
➢ Freeze the layers in the base model so that they are not trainable
➢ Create a new model that includes the VGG16 base model and additional layers for
classification