36. • Data augmentation is a technique to artificially create new training data from existing training data. This is done
by applying domain-specific techniques to examples from the training data that create new and different
training examples.
• Image data augmentation is perhaps the most well-known type of data augmentation and involves creating
transformed versions of images in the training dataset that belong to the same class as the original image.
• Transforms include a range of operations from the field of image manipulation, such as shifts, flips, zooms, and
much more.
• The intent is to expand the training dataset with new, plausible examples. This means, variations of the training
set images that are likely to be seen by the model. For example, a horizontal flip of a picture of a cat may make
sense, because the photo could have been taken from the left or right.
• A vertical flip of the photo of a cat does not make sense and would probably not be appropriately given that the
model is very unlikely to see a photo of an upside down cat.
Data augmentation
37. • As such, it is clear that the choice of the specific data augmentation techniques used for a training dataset
must be chosen carefully and within the context of the training dataset and knowledge of the problem
domain.
• In addition, it can be useful to experiment with data augmentation methods in isolation and in concert to see
if they result in a measurable improvement to model performance, perhaps with a small prototype dataset,
model, and training run.
• Modern deep learning algorithms, such as the convolutional neural network, or CNN, can learn features that
are invariant to their location in the image.
• Nevertheless, augmentation can further aid in this transform invariant approach to learning and can aid the
model in learning features that are also invariant to transforms such as left-to-right to top-to-bottom ordering,
light levels in photographs, and more.
• Image data augmentation is typically only applied to the training dataset, and not to the validation or test
dataset. This is different from data preparation such as image resizing and pixel scaling; they must be
performed consistently across all datasets that interact with the model.
38. Some of the most common data augmentation techniques used for images are:
Position augmentation
• Scaling
• Cropping
• Flipping
• Padding
• Rotation
• Translation
• Affine transformation
Color augmentation
• Brightness
• Contrast
• Saturation
• Hue
50. Image segmentation is a computer vision task that segments an image into multiple areas by assigning
a label to every pixel of the image. It provides much more information about an image than object
detection, which draws a bounding box around the detected object, or image classification, which
assigns a label to the object.
Segmentation is useful and can be used in real-world applications such as medical imaging, clothes
segmentation, flooding maps, self-driving cars, etc.
There are two types of image segmentation:
• Semantic segmentation: classify each pixel with a label.
• Instance segmentation: classify each pixel and differentiate each object instance.
U-Net is a semantic segmentation technique originally proposed for medical imaging segmentation. It’s
one of the earlier deep learning segmentation models, and the U-Net architecture is also used in many
GAN variants such as the Pix2Pix generator.
51. U-Net Architecture
The model architecture is fairly simple: an encoder (for downsampling) and a decoder (for upsampling) with
skip connections. As Figure shows, it shapes like the letter U hence the name U-Net.
The gray arrows indicate the skip connections that concatenate the encoder feature map with the decoder,
which helps the backward flow of gradients for improved training.
52. Import required libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import numpy as np
Dataset
we can use tfds to load the dataset by specifying the name of the dataset, and get the dataset
info by setting with_info=True
dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)
53. Print the dataset info with print(info), and you will see all kinds of detailed information about the Oxford
pet dataset. For example, in Figure below, it can seen there are a total of 7349 images with a built-in
test/train split.
54. To make a few changes to the downloaded data before start of training U-Net with it.
First, resize the images and masks to 128x128:
def resize(input_image, input_mask):
input_image = tf.image.resize(input_image, (128, 128), method="nearest")
input_mask = tf.image.resize(input_mask, (128, 128), method="nearest")
return input_image, input_mask
Create a function to augment the dataset by flipping them horizontally:
def augment(input_image, input_mask):
if tf.random.uniform(()) > 0.5:
# Random flipping of the image and mask
input_image = tf.image.flip_left_right(input_image)
input_mask = tf.image.flip_left_right(input_mask)
return input_image, input_mask
55. Create a function to normalize the dataset by scaling the images to the range of [-1, 1] and decreasing the image mask
by 1:
def normalize(input_image, input_mask):
input_image = tf.cast(input_image, tf.float32) / 255.0
input_mask -= 1
return input_image, input_mask
create two functions to preprocess the training and test datasets with a slight difference between the two – we only
perform image augmentation on the training dataset.
def load_image_train(datapoint):
input_image = datapoint["image"]
input_mask = datapoint["segmentation_mask"]
input_image, input_mask = resize(input_image, input_mask)
input_image, input_mask = augment(input_image, input_mask)
input_image, input_mask = normalize(input_image, input_mask)
return input_image, input_mask
def load_image_test(datapoint):
input_image = datapoint["image"]
input_mask = datapoint["segmentation_mask"]
input_image, input_mask = resize(input_image, input_mask)
input_image, input_mask = normalize(input_image, input_mask)
return input_image, input_mask
56. Now, build an input pipeline with tf.data by using the map() function:
train_dataset = dataset["train"].map(load_image_train, num_parallel_calls=tf.data.AUTOTUNE)
test_dataset = dataset["test"].map(load_image_test, num_parallel_calls=tf.data.AUTOTUNE)
If we execute print(train_dataset), we will notice that the image is in the shape of 128x128x3 of tf.float32
while the image mask is in the shape of 128x128x1 with the data type of tf.uint8.
We define a batch size of 64 and a buffer size of 1000 for creating batches of training and test datasets.
With the original TFDS dataset, there are 3680 training samples and 3669 test samples, which are further
split into validation/test sets. We will use the train_batches and the validation_batches for training the U-
Net model. After the training finishes, we will then use the test_batches to test the model predictions.
BATCH_SIZE = 64
BUFFER_SIZE = 1000
train_batches = train_dataset.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_batches = train_batches.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
validation_batches = test_dataset.take(3000).batch(BATCH_SIZE)
test_batches = test_dataset.skip(3000).take(669).batch(BATCH_SIZE)
57. Now the datasets are ready for training. Let’s visualize a random sample image and its mask from the
training dataset, to get an idea of how the data looks.
def display(display_list):
plt.figure(figsize=(15, 15))
title = ["Input Image", "True Mask", "Predicted Mask"]
for i in range(len(display_list)):
plt.subplot(1, len(display_list), i+1)
plt.title(title[i])
plt.imshow(tf.keras.utils.array_to_img(display_list[i]))
plt.axis("off")
plt.show()
sample_batch = next(iter(train_batches))
random_index = np.random.choice(sample_batch[0].shape[0])
sample_image, sample_mask = sample_batch[0][random_index], sample_batch[1][random_index]
display([sample_image, sample_mask])
Output
58. Model Architecture
Now that we have the data ready for training, let’s define the U-Net model architecture. As mentioned earlier, the
U-Net is shaped like a letter U with an encoder, decoder, and the skip connections between them. So we will create
a few building blocks to make the U-Net model.
Building blocks
First, we create a function double_conv_block with layers Conv2D-ReLU-Conv2D-ReLU, which we will use in both the
encoder (or the contracting path) and the bottleneck of the U-Net.
def double_conv_block(x, n_filters):
# Conv2D then ReLU activation
x = layers.Conv2D(n_filters, 3, padding = "same", activation = "relu", kernel_initializer = "he_normal")(x)
# Conv2D then ReLU activation
x = layers.Conv2D(n_filters, 3, padding = "same", activation = "relu", kernel_initializer = "he_normal")(x)
return x
Then we define a downsample_block function for downsampling or feature extraction to be used in the encoder.
def downsample_block(x, n_filters):
f = double_conv_block(x, n_filters)
p = layers.MaxPool2D(2)(f)
p = layers.Dropout(0.3)(p)
return f, p
59. Finally, we define an upsampling function upsample_block for the decoder (or expanding path) of the U-Net.
def upsample_block(x, conv_features, n_filters):
# upsample
x = layers.Conv2DTranspose(n_filters, 3, 2, padding="same")(x)
# concatenate
x = layers.concatenate([x, conv_features])
# dropout
x = layers.Dropout(0.3)(x)
# Conv2D twice with ReLU activation
x = double_conv_block(x, n_filters)
return x
60. U-Net has a fairly simple architecture; however, to create the skip connections between
the encoder and decoder, we will need to concatenate some layers. So the Keras
Functional API is most appropriate for this purpose.
First, we create a build_unet_model function, specify the inputs, encoder layers,
bottleneck, decoder layers, and finally the output layer with Conv2D with activation of
softmax. Note the input image shape is 128x128x3. The output has three channels
corresponding to the three classes that the model will classify each pixel for: background,
foreground object, and object outline.
62. Compile and Train U-Net
To compile unet_model, we specify the optimizer, the loss function, and the accuracy metrics to track
during training:
unet_model.compile(optimizer=tf.keras.optimizers.Adam(),
loss="sparse_categorical_crossentropy",
metrics="accuracy")
We train the unet_model by calling model.fit() and training it for 20 epochs.
NUM_EPOCHS = 20
TRAIN_LENGTH = info.splits["train"].num_examples
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE
VAL_SUBSPLITS = 5
TEST_LENTH = info.splits["test"].num_examples
VALIDATION_STEPS = TEST_LENTH // BATCH_SIZE // VAL_SUBSPLITS
model_history = unet_model.fit(train_batches,
epochs=NUM_EPOCHS,
steps_per_epoch=STEPS_PER_EPOCH,
validation_steps=VALIDATION_STEPS,
validation_data=test_batches)
63. After training for 20 epochs, we get a training accuracy and a validation accuracy of ~0.88. The
learning curve during training indicates that the model is doing well on both the training dataset and
validation set, which indicates the model is generalizing well without much overfitting (as shown in
Figure below).
64. Prediction
Now that we have completed training the unet_model, let’s use it to make predictions on a
few sample images of the test dataset.
def create_mask(pred_mask):
pred_mask = tf.argmax(pred_mask, axis=-1)
pred_mask = pred_mask[..., tf.newaxis]
return pred_mask[0]
def show_predictions(dataset=None, num=1):
if dataset:
for image, mask in dataset.take(num):
pred_mask = unet_model.predict(image)
display([image[0], mask[0], create_mask(pred_mask)])
else:
display([sample_image, sample_mask,
create_mask(model.predict(sample_image[tf.newaxis, ...]))])
count = 0
for i in test_batches:
count +=1
print("number of batches:", count)
65. Figure below shows the input images, the true masks, and the masks predicted by the trained U-Net model