how are running_mean and running_var used in code?
I mean in what case they are used?
I am experiencing the following issue:
there are 2 bn layers in 2 models respectively. the input, weight, bias are all the same, and the only difference is running_mean and running_var. in model.train() case, the output of those 2 bn layer are different. why?
In the default setup the running stats are updated during training (i.e. if the model is in train() mode) using the mentioned update formula. The input batches are normalized using the batch statistics.
During validation (i.e. if the model is in eval() mode) the running stats will be used to normalize the input.
If both layers are getting the same input and are both in train(), their output would also be equal. I would guess that the input might be different and you should compare it.