Augment the CIFAR10 Dataset Using the RandomHorizontalFlip and RandomCrop Transforms

Video Transcript

Data augmentation is the process of artificially enlarging your training dataset using carefully chosen transforms.

When used appropriately, data augmentation can make your trained models more robust and capable of achieving higher accuracy without requiring larger dataset.

For those who are familiar with it, data augmentation is very similar to regularization in that it can prevent over-fitting compared to another identical model learning on the same dataset for the same number of epochs.

Two very useful transforms of this type that are commonly used in computer vision are random flipping and random cropping.

In torchvision, random flipping can be achieved with a random horizontal flip and random vertical flip transforms while random cropping can be achieved using the random crop transform.

We first need to import torch:

import torch

Then import torchvision:

import torchvision

Then import torchvision.transforms as transforms:

import torchvision.transforms as transforms

And then import torchvision.datasets as datasets:

import torchvision.datasets as datasets

We also want to check that our installed versions of torch and torchvision are current.

print(torch.__version__)

print(torchvision.__version__)

We can then define a composed training dataset transform as follows:

first_train_transform = transforms.Compose(
                [transforms.RandomHorizontalFlip(),
                 transforms.RandomCrop(size=[32,32], padding=4),
                 transforms.ToTensor()])

RandomHorizontalFlip without arguments will simply randomly flip the image horizontally with probability 0.5.

RandomCrop takes a more detailed set of parameters.

Firstly, the size parameter is either a sequence or integer indicating the output size of RandomCrop.

If an integer is provided, the output will be a square crop with side length equal to the integer provided:

second_train_transform = transforms.Compose(
               [transforms.RandomHorizontalFlip(),
                transforms.RandomCrop(size=32, padding=4),
                transforms.ToTensor()])

That means that instead of specifying the parameter as [32,32] as in the first train transform, I could simply write size=32 as in the second train transform, meaning that first train transform and second train transform are actually identical.

The padding parameter indicates how much padding or white space we want to add to the edges of the image before cropping.

If an integer is provided, as in second train transform, then equal padding is added to all sides.

On the other hand, if a sequence of length two is provided, as in third train transform, then different amounts of padding will be added to the top and bottom and to the left and right.

third_train_transform = transforms.Compose(
               [transforms.RandomHorizontalFlip(),
                transforms.RandomCrop(size=32, padding=[0, 4]),
                transforms.ToTensor()])

In the case of third train transform, no padding will be added to the left and right but padding of size 4 will be added to the top and bottom.

Lastly, if a sequence of size 4 is provided, then different amounts of white space will be added to the top, bottom, left, and right of the image.

fourth_train_transform = transforms.Compose(
               [transforms.RandomHorizontalFlip(),
                transforms.RandomCrop(size=32, padding=[0, 2, 3, 4]),
                transforms.ToTensor()])

In the case of fourth train transform, this means that no padding will be added to the left, a padding of size 2 will be added to the top, a padding of size 3 will be added to the right, and padding of size 4 will be added to the bottom.

It is worth noting that both RandomHorizontalFlip and RandomCrop need to appear before the ToTensor transform when creating a Compose transform because RandomHorizontalFlip and RandomCrop are designed to act on images, not tensors.

Any of these Compose transforms can be passed through the transform parameter of our import function; in this case, the CIFAR10.

cifar_trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=fourth_train_transform)