# Help understanding Batchnorm

I have a Pytorch model consisting of a convolution2d followed by BatchNorm2d and I am printing the output of each layer in the forward pass.

I cannot seem to understand the result of the output of BatchNorm based on the values of weight and bias it holds.

The following is the outputs as printed in Pytorch(conv output (which is also input to BatchNorm )and BatchNorm output):

`````` tensor([[[[-0.0403,  0.0103,  0.0185],
[ 0.0240,  0.0535,  0.0137],
[ 0.0233,  0.0239, -0.0202]],

[[-0.1044, -0.1664, -0.2347],
[-0.1708, -0.2092, -0.2356],

tensor([[[[-1.6799, -0.0496,  0.2127],
[ 0.3922,  1.3428,  0.0598],
[ 0.3674,  0.3883, -1.0339]],

[[ 0.4344,  0.1697, -0.1216],
[ 0.1510, -0.0127, -0.1253],

``````

the outputs were printed from the `forward` function as:

``````x1 = self.conv1(x)
print(x1)
x2 = self.bn(x1)
print(x2)
``````

Now when I print the weights and bias respectively of the BatchNorm layer it shows this:

``````Parameter containing:

Parameter containing:

``````

If BatchNorm is (weights*previoustensor + bias), then the first output value should have been `(0.8352 * -0.0403) + 0 = -0.0336` but it shows `-1.6799`

Could someone please explain? I ask this as one of my colleagues pointed this out. In our internal code, our output is indeed -0.033 for the first index so we wanted to understand what was the value reasoning behind Pytorch or if there are other factors involved.

1 Like

I think I figured this out. Someone can confirm:

it basically normalizes the output from conv per channel so that we have C means and variances. It then adjusts the output of conv by subtracting mean (for that channel) and dividing by variance (for that channel) and then multiplies by result by the weight of Batchnorm for that channel to get the value.

1 Like

Yes, that’s the applied method in the `train()` mode.

If you call `model.eval()`, the running estimates will be used to normalize the input instead of the current batch statistic.

thanks! I am trying to ensure that BatchNorm is used in training mode and is frozen because there are other layers that will be updated. Do I need to do this to the module after the net object is created??

``````net.bn.weight.requires_grad=False
net.bn.train()

``````

If you don’t want to train the affine parameters at all (`weight` and `bias`), you could just initialize the batch norm layer with `affine=False`.
Otherwise to disable their updates temporarily, you could set the `.requires_grad` attribute to `False` as shown in your example.

1 Like

Ok I will try that. But is `net.bn.train()` absolutely required so that it behaves as though the layer is in not in eval mode? If I am not wrong all modules are in train() mode by default maybe then this is not needed. On the contrary if I needed to do inference, I would compulsorily require `net.bn.eval()` ?

Yes, that’s right. All modules are in training mode by default after initialization.
Sorry, I’ve overlooked the last line of code.

For inference, I would rather call `net.eval()`, which will set all modules recursively to evaluation mode.

1 Like