nn.BatchNorm1d() work in eval() mode? I looked up the source code, but this points to
F.batch_norm, but this points to
torch.batch_norm (which has no file I can look up).
Fundamentally, I want do something very simple.
Suppose, I have input
x, which is passed into
nn.BatchNorm1d(x) to get
y1 during eval() mode. I want to recreate
y by using the parameters (
running_var - where
bias are stored inside the module’s
parameters variable) that are inside nn.BatchNorm1d via the following:
gamma * (
running_mean) / (sqrt(
y2. Am I missing something?
How large is the difference between
If it’s approx. 1e-6 or smaller, you might just see the limited floating point precision of FP32 numbers.
To compare these tensors, you could alternatively use
torch.allclose and specify the absolute or relative error, if necessary.
I’ve created a manual approach of
nn.BatchNorm2d some time ago here. Maybe it might help in debugging.
Thank you so much! Will check the differences exactly, but I did notice one thing.
While the differences were very small, the total accuracy as a result of recreating batchnorm was far worse (i.e. the actual predicted answers from the network were totally different than the manual predictions I found after recreating batchnorm).
Could you rerun the code using a few seeds and check, if you get these bad results randomly using both approaches?
If the error of the manual vs. PyTorch batch norm output is small, e.g. an unlucky initialization might yield these results.
Sorry for the delay in my response. The results were bad until I normalized the batchnorm weights.
After this, I actually noticed that manually forward propping through the weights of my layers/batchnorm (since it already has normalized
running_var variables) yielded a higher accuracy than auto forward propping through the whole net.
Thanks for your help @ptrblck. Typically, I’ve only used batchnorm2d, and glorot initializations can be applied here, so didn’t think to apply any kind initialize on batchnorm1d.