BatchNorm2d - Use the PyTorch BatchNorm2d Module to accelerate Deep Network training by reducing internal covariate shift

BatchNorm2d - Use the PyTorch BatchNorm2d Module to accelerate Deep Network training by reducing internal covariate shift

Batch normalization is a technique that can improve the learning rate of a neural network.

It does so by minimizing internal covariate shift which is essentially the phenomenon of each layer’s input distribution changing as the parameters of the layer above it change during training.

More concretely, in the displayed network

```
class Convolutional(nn.Module):
def __init__(self, input_channels=3, num_classes=10)
super(Convolutional, self).__init__()
self.layer1 = nn.Sequential()
self.layer1.add_module("Conv1", nn.Conv2d(in_channels=input_channels, out_channels=16, kernel_size=3, padding=1))
self.layer1.add_module("BN1", nn.BatchNorm2d(num_features=16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
self.layer1.add_module("Relu1", nn.ReLU(inplace=False))
self.layer2 = nn.Sequential()
self.layer2.add_module("Conv2", nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1, stride=2))
self.layer2.add_module("BN2", nn.BatchNorm2d(num_features=32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
self.layer2.add_module("Relu2", nn.ReLU(inplace=False))
self.fully_connected = nn.Linear(32 * 16 * 16, num_classes)
def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = x.view(-1, 32 * 16 * 16)
x = self.fully_connected(x)
return x
```

The second layer

```
x = self.layer2(x)
```

has an expected distribution of inputs coming from the first layer

```
x = self.layer1(x)
```

and its parameters are optimized for this expected distribution.

As the parameters in the first layer are updated

```
x = self.layer1(x)
```

this expected distribution becomes less like a true distribution passed by layer1.

This is a problem because it can force some of layer2’s activations to saturate which significantly slows down training.

Batch normalization

```
self.layer1.add_module("BN1", nn.BatchNorm2d(num_features=16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
```

grants us the freedom to use larger learning rates while not worrying as much about internal covariate shift.

This, in turn, means that our network can be trained slightly faster.

In the displayed network, batch normalization is applied to both the first

```
self.layer1.add_module("BN1", nn.BatchNorm2d(num_features=16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
```

and the second layer

```
self.layer2.add_module("BN2", nn.BatchNorm2d(num_features=32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
```

with minimal modification from default arguments.

num_features

```
num_features=32
```

is a required argument that tells BatchNorm how many features are in the output of the function above it.

In the case of layer1 and layer2 in the displayed network

```
out_channels=32
```

this would be the output channels of their Conv2d functions.

Channels are equivalent to features but channels is more commonly used when referring to image data sets as the original image has a certain number of colored channels.

eps or epsilon

```
eps=1e-05
```

is a value added to the denominator of the batch normalization calculation.

This is just to improve numerical stability and it should only be modified with good reason.

The BatchNorm function will keep a running estimate of its computed mean and variance during training for use during evaluation of the network.

This can be disabled by setting track_running_stats

```
track_running_stats=True
```

to False in which case, the batch statistics are calculated and used during evaluation as well.

The momentum argument

```
momentum=0.1
```

determines the rate at which the running estimates are updated.

If it is set to none, then the running estimates will be simple averaging.

Lastly, the affine argument

```
affine=True
```

when set to true indicates the BatchNorm should have learnable affine parameters.

The default value for affine is True.

Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.