How does nn.BatchNorm1d()
work in eval() mode? I looked up the source code, but this points to F.batch_norm
, but this points to torch.batch_norm
(which has no file I can look up).
Fundamentally, I want do something very simple.
Suppose, I have input x
, which is passed into nn.BatchNorm1d(x)
to get y1
during eval() mode. I want to recreate y
by using the parameters (gamma
, bias
, running_mean
, running_var
- where gamma
and bias
are stored inside the module’s parameters
variable) that are inside nn.BatchNorm1d via the following:
y2
= gamma
* (x
- running_mean
) / (sqrt(running_var
)) + bias
However, y1
!= y2
. Am I missing something?
Thank you!
How large is the difference between y1
and y2
?
If it’s approx. 1e-6 or smaller, you might just see the limited floating point precision of FP32 numbers.
To compare these tensors, you could alternatively use torch.allclose
and specify the absolute or relative error, if necessary.
I’ve created a manual approach of nn.BatchNorm2d
some time ago here. Maybe it might help in debugging.
1 Like
Thank you so much! Will check the differences exactly, but I did notice one thing.
While the differences were very small, the total accuracy as a result of recreating batchnorm was far worse (i.e. the actual predicted answers from the network were totally different than the manual predictions I found after recreating batchnorm).
Could you rerun the code using a few seeds and check, if you get these bad results randomly using both approaches?
If the error of the manual vs. PyTorch batch norm output is small, e.g. an unlucky initialization might yield these results.
Sorry for the delay in my response. The results were bad until I normalized the batchnorm weights.
After this, I actually noticed that manually forward propping through the weights of my layers/batchnorm (since it already has normalized running_mean
and running_var
variables) yielded a higher accuracy than auto forward propping through the whole net.
Thanks for your help @ptrblck. Typically, I’ve only used batchnorm2d, and glorot initializations can be applied here, so didn’t think to apply any kind initialize on batchnorm1d.
1 Like