`F.batch_norm` returns different results in `train` and `eval` mode given same setup

marsggbo · December 30, 2021, 6:15am

1. set seed

import torch
import torch.nn as nn
import torch.nn.functional as F

from pytorch_lightning.utilities.seed import seed_everything
seed_everything(666)

num=6
x = torch.rand(4,num)
print(x)
>>> tensor([[0.3119, 0.2701, 0.1118, 0.1012, 0.1877, 0.0181],
        [0.3317, 0.0846, 0.5732, 0.0079, 0.2520, 0.5518],
        [0.8785, 0.5281, 0.4961, 0.9791, 0.5817, 0.4875],
        [0.0650, 0.7506, 0.2634, 0.3684, 0.5035, 0.9089]])

1. randomly init a BN

bn = nn.BatchNorm1d(num)

rand = lambda num: torch.rand(num)

weight = rand(num)
bias = rand(num)
mean = rand(num)
var = rand(num)

bn.weight.data = weight.data
bn.bias.data = bias.data
bn.running_mean.data = mean.data
bn.running_var.data = var.data

1. compare different ops

bn.eval()
y1 = bn(x)

>>> tensor([[ 0.1871, -0.7274, -0.2401,  0.2894,  0.8241,  0.0806],
        [ 0.2042, -1.5069,  0.6280,  0.2882,  0.8298,  0.6823],
        [ 0.6761,  0.3567,  0.4828,  0.3000,  0.8590,  0.6098],   
        [-0.0260,  1.2915,  0.0450,  0.2926,  0.8521,  1.0849]],
       grad_fn=<NativeBatchNormBackward>)

y2 = F.batch_norm(x, mean, var, weight, bias, eps=1e-5, momentum=0.1, training=False)
>>> tensor([[ 0.1871, -0.7274, -0.2401,  0.2894,  0.8241,  0.0806],
        [ 0.2042, -1.5069,  0.6280,  0.2882,  0.8298,  0.6823],
        [ 0.6761,  0.3567,  0.4828,  0.3000,  0.8590,  0.6098],
        [-0.0260,  1.2915,  0.0450,  0.2926,  0.8521,  1.0849]])

We can see that the two forms return the same results.

But if we try F.batch_norm(..., training=True), we will get totally different results

y2 = F.batch_norm(x, mean, var, weight, bias, eps=1e-5, momentum=0.1, training=True)
>>> tensor([[ 0.1582, -0.3523, -0.2571,  0.2823,  0.8138, -0.5995],
        [ 0.1835, -0.9058,  0.7885,  0.2801,  0.8397,  0.7659],
        [ 0.8830,  0.4174,  0.6137,  0.3029,  0.9721,  0.6014],
        [-0.1577,  1.0812,  0.0864,  0.2886,  0.9407,  1.6796]])

My question is what is the role of training in F.batch_norm?

ptrblck · December 30, 2021, 6:48am

I get the same expected results if both approaches are using eval and training mode, respectively.

During training the batch stats will be used to normalize the input (and the running stats will be updated) while the running stats will be used to normalize the input during eval.
The docs explain this too and also give the formula how the running stats are updated.