The behavior of the BN layer in train and eval mode

is_Juncheng · May 31, 2022, 2:10am

Search before you ask. I saw a lot of related questions, but none of them were complete enough, so I wanted to provide a comprehensive summary and in-depth discussion here.

First we should clarify how the BN layer is calculated：

image921×271 23.3 KB

reference：BatchNorm2d — PyTorch master documentation
The BN layer will calculate the mean and val in each batch of input data and then normalize them. And move through γ and β to the new data distribution where γ and β are learnable parameters.
According to my own experiments, setting eval mode on the BN layer does not prevent the BN layer parameters from being updated.Just different behavior！
This is also reflected in many of the responses, eg.
What is the meaning of the training attribute?model.train() and model.eval()? - #2 by ptrblck
Does gradients change in Model.eval()? - #2 by ptrblck
What does model.eval() do for batchnorm layer? - #2 by smth
problem:
In model.train(), we use the input data’s statistical features (mean and var) to normalize and use the training γ and β to move to the new data distribution.
In model.eval(), we use the trained γ and β statistical features for normalization. Right?

I’m not sure if my insights are correct, feel free to correct me!

ptrblck · May 31, 2022, 3:13am

No, gamma and beta refer to the trainable affine parameters stored as the .weight and .bias attributes. These parameters will not be updated by the calculated mean and stddev but the running_mean and running_var during training.
That’s correct as the .train() and .eval() mode does not change the behavior of the trainable parameters, just how the input activation will be normalized:

during training: the input activation will be normalized using the calculated input activations stats (mean and var) and the running stats will be updated (running_mean and running_var)
during eval: the input activation will be normalized using the running stats as described also in your cross-post.

Calling model.train() or model.eval() ooes not disable the gradient calculation of the affine parameters (.weight and .bias).

No, during eval() gamma/weight and beta/bias will still be used as before. The normalization will be done using the running stats (mean subtraction and division by the stddev).