Hey,

I know batch normalization uses different statistics on eval() mode and train() mode, but, when

I make those statistics the same, it still gives me different values.

Here’s the code to reproduce what I am talking about

```
import torch
import torch.nn.functional as F
import torch.nn as nn
a = torch.tensor([[[1,2],
[3,4]]]).float()
b = torch.tensor([[[10,20],
[30,40]]]).float()
X = torch.stack((a,b)).float()
assert X.shape == (2,1,2,2)
l = nn.BatchNorm2d(1, momentum=1, eps=0).train() #1 channel
l(X) #calculate running_means and running_var
def batchnorm(x,u,var):
return (x - u)/(torch.sqrt(var)) #epsilon is 0
one = batchnorm(X,l.running_mean, l.running_var)
# batch norm eval
l.eval()
two = l(X)
# batch norm train
l.train()
three = l(X)
assert (torch.abs(one - two) < 0.0001).all()
assert not (torch.abs(one - three) < 0.0001).all()
```

This code should run without problems because tensor **one** and **three** are different when I

think they should be equal.

Thank you