Batch normalization different between .eval and .train modes even when running and batch statistics are the same

Hey,
I know batch normalization uses different statistics on eval() mode and train() mode, but, when
I make those statistics the same, it still gives me different values.

Here’s the code to reproduce what I am talking about

import torch
import torch.nn.functional as F
import torch.nn as nn

a = torch.tensor([[[1,2],
                     [3,4]]]).float()

b = torch.tensor([[[10,20],
                     [30,40]]]).float()

X = torch.stack((a,b)).float() 
assert X.shape == (2,1,2,2)

l = nn.BatchNorm2d(1, momentum=1, eps=0).train() #1 channel
l(X) #calculate running_means and running_var

def batchnorm(x,u,var):
    return (x - u)/(torch.sqrt(var)) #epsilon is 0

one = batchnorm(X,l.running_mean, l.running_var)

# batch norm eval
l.eval()
two = l(X)

# batch norm train
l.train()
three = l(X)


assert (torch.abs(one - two) < 0.0001).all()
assert not (torch.abs(one - three) < 0.0001).all()

This code should run without problems because tensor one and three are different when I
think they should be equal.

Thank you

Your code returns the expected mismatches:

torch.abs(one - two)
Out[15]: 
tensor([[[[5.9605e-08, 0.0000e+00],
          [5.9605e-08, 5.9605e-08]]],


        [[[0.0000e+00, 0.0000e+00],
          [0.0000e+00, 0.0000e+00]]]], grad_fn=<AbsBackward0>)

torch.abs(one - three)
Out[16]: 
tensor([[[[0.0598, 0.0551],
          [0.0504, 0.0457]]],


        [[[0.0176, 0.0293],
          [0.0762, 0.1231]]]], grad_fn=<AbsBackward0>)

The difference is expected as the running_var will be updated with Bessel’s correction as seen here.

print(l.running_mean)
# tensor([13.7500])
print(X.mean([0, 2, 3]))
# tensor([13.7500])

print(l.running_var)
# tensor([216.7857])
print(X.var([0, 2, 3], unbiased=False))
# tensor([189.6875])
print(X.var([0, 2, 3], unbiased=False) * X.numel() / (X.numel() - 1))
# tensor([216.7857])