I know batch normalization uses different statistics on eval() mode and train() mode, but, when
I make those statistics the same, it still gives me different values.
Here’s the code to reproduce what I am talking about
import torch import torch.nn.functional as F import torch.nn as nn a = torch.tensor([[[1,2], [3,4]]]).float() b = torch.tensor([[[10,20], [30,40]]]).float() X = torch.stack((a,b)).float() assert X.shape == (2,1,2,2) l = nn.BatchNorm2d(1, momentum=1, eps=0).train() #1 channel l(X) #calculate running_means and running_var def batchnorm(x,u,var): return (x - u)/(torch.sqrt(var)) #epsilon is 0 one = batchnorm(X,l.running_mean, l.running_var) # batch norm eval l.eval() two = l(X) # batch norm train l.train() three = l(X) assert (torch.abs(one - two) < 0.0001).all() assert not (torch.abs(one - three) < 0.0001).all()
This code should run without problems because tensor one and three are different when I
think they should be equal.