Batch normalization different between .eval and .train modes even when running and batch statistics are the same

I know batch normalization uses different statistics on eval() mode and train() mode, but, when
I make those statistics the same, it still gives me different values.

Here’s the code to reproduce what I am talking about

import torch
import torch.nn.functional as F
import torch.nn as nn

a = torch.tensor([[[1,2],

b = torch.tensor([[[10,20],

X = torch.stack((a,b)).float() 
assert X.shape == (2,1,2,2)

l = nn.BatchNorm2d(1, momentum=1, eps=0).train() #1 channel
l(X) #calculate running_means and running_var

def batchnorm(x,u,var):
    return (x - u)/(torch.sqrt(var)) #epsilon is 0

one = batchnorm(X,l.running_mean, l.running_var)

# batch norm eval
two = l(X)

# batch norm train
three = l(X)

assert (torch.abs(one - two) < 0.0001).all()
assert not (torch.abs(one - three) < 0.0001).all()

This code should run without problems because tensor one and three are different when I
think they should be equal.

Thank you

Your code returns the expected mismatches:

torch.abs(one - two)
tensor([[[[5.9605e-08, 0.0000e+00],
          [5.9605e-08, 5.9605e-08]]],

        [[[0.0000e+00, 0.0000e+00],
          [0.0000e+00, 0.0000e+00]]]], grad_fn=<AbsBackward0>)

torch.abs(one - three)
tensor([[[[0.0598, 0.0551],
          [0.0504, 0.0457]]],

        [[[0.0176, 0.0293],
          [0.0762, 0.1231]]]], grad_fn=<AbsBackward0>)

The difference is expected as the running_var will be updated with Bessel’s correction as seen here.

# tensor([13.7500])
print(X.mean([0, 2, 3]))
# tensor([13.7500])

# tensor([216.7857])
print(X.var([0, 2, 3], unbiased=False))
# tensor([189.6875])
print(X.var([0, 2, 3], unbiased=False) * X.numel() / (X.numel() - 1))
# tensor([216.7857])