This video will show how to import the Torchvision CIFAR10 dataset.
CIFAR10 is a dataset consisting of 60,000 32x32 color images of common objects.
First, we will import torch.
import torch
Then we will import torchvision.
import torchvision
Torchvision is a package in the PyTorch library containing computer-vision models, datasets, and image transformations.
Then we will import torchvision.datasets as datasets.
import torchvision.datasets as datasets
Before we begin, we need to make sure that our installed versions of both torch and torchvision are current.
We need to print the torch version:
print(torch.__version__)
As well as the torchvision version:
print(torchvision.__version__)
As of February 24, 2018, 0.3.1 for torch and 0.2.0 for torchvision are the current versions.
So these are correct.
Moving on, to access the dataset, we will do the following.
We can initialize the CIFAR training set using cifar_trainset = datasets.CIFAR10 with the parameters root='./data', train=True, download=True, and transform=None.
cifar_trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=None)
Similarly, we can initialize the test set using cifar_testset = datasets.CIFAR10 with the parameters root='./data', train=False, download=True, and transform=None.
cifar_testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=None)
The root parameter of the function specifies the directory where the dataset is or will be stored.
If the train parameter is set to True, the return is the training dataset and if it is set to False, the return is the testing dataset.
The default for this parameter is True.
If the download parameter is set to True, then the dataset is not found in the directory specified in the root parameter then the dataset will be downloaded from the internet.
This parameter defaults to False and must be set to True if the cifar10 dataset is not already present at root.
Lastly, the transform parameter, which defaults to None, specifies how you want to transform the images in the dataset.
Common uses for this parameter are normalization and augmentation of the data.