AvgPool2D: How to Incorporate Average pooling into a PyTorch Neural Network

Video Transcript

It is common practice to use either max pooling or average pooling at the end of a neural network but before the output layer in order to reduce the features to a smaller, summarized form.

Max pooling strips away all information of the specified kernel except for the strongest signal.

Average pooling summarizes the signal in the kernel to a single average.

The displayed example network

import torch.nn as nn

class Convolutional(nn.Module):
    def __init__(self, input_channels=3, num_classes=10):
        super(Convolutional, self).__init__()
        self.layer1 = nn.Sequential()
        self.layer1.add_module("Conv1", nn.Conv2d(in_channels=input_channels, out_channels=16, kernel_size=3, padding=1))
        self.layer1.add_module("BN1", nn.BatchNorm2d(num_features=16, eps=1e-05, momentum=0.1, affine=True,
            track_running_stats=True))
        self.layer1.add_module("Relu1", nn.ReLU(inplace=False))
        self.layer2 = nn.Sequential()
        self.layer2.add_module("Conv2", nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1, stride=2))
        self.layer2.add_module("BN2", nn.BatchNorm2d(num_features=32, eps=1e-05, momentum=0.1, affine=True,
            track_running_stats=True))
        self.layer2.add_module("Relu2", nn.ReLU(inplace=False))
        self.avg_pool("AvgPool1", nn.AvgPool2D(kernel_size=4, stride=4, padding=0, ceil_mode=False,
            count_include_pad=False))
        self.fully_connected = nn.Linear(32 * 4 * 4, num_classes)
    def forward(self, x):
        y = x.clone()
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.avg_pool(x)
        x = x.view(-1, 32 * 4 * 4)
        x = self.fully_connected(x)
        return x

uses the AvgPool2D function

self.avg_pool("AvgPool1", nn.AvgPool2D(kernel_size=4, stride=4, padding=0, ceil_mode=False,
            count_include_pad=False))

to perform average pooling on the output of the second convolutional layer.

The kernel size argument

kernel_size=4

is required and determines how large of an area the average pooling has taken over.

I chose 4 because it evenly divides the height and width of the output of the Conv2d layer above it which is 16x16.

The stride

stride=4

can be specified as smaller or larger than the kernel size but having it be equal to the kernel size ensures that there is no overlap in the output averages, as is the case if the stride is less than the kernel size, and then no values are skipped over as is the case if the stride is greater than the kernel size.

This value defaults to kernel size.

Padding

padding=0

is the implicit zero padding to be added to the edges of the inputs before calculation.

This can be useful if your kernel size does not evenly divide the height and width of the input features.

This will default to zero.

Average pooling needs to compute a new output shape.

This is usually calculated using a formula

ceil_mode=False

involving the kernel size, stride, padding, and shape of the inputs, then taking the floor of that calculation.

This can be changed to the ceiling by setting ceil_mode=True.

count_include_pad

count_include_pad=False

becomes relevant if you have added implicit zero padding.

In that case, setting count_include_pad to true will instruct avg_pool to include the zero padding when calculating its averages.

After the average pool layer is set up, we simply need to add it to our forward method.

x = self.avg_pool(x)

One last thing, the input dimensions of the fully connected output layer need to be changed to match average pool as average pool changes the shape of layer2’s outputs.

self.fully_connected = nn.Linear(32 * 4 * 4, num_classes)

The input dimension is now 32x4x4 because average pool has reduced the height and width of each feature map to 4.

This needs to also be updated in the view function so it is opening tensors to the desired shape.

x = x.view(-1, 32 * 4 * 4)