Recreating nn.BatchNorm1d() manually given it's trained parameters

How does nn.BatchNorm1d() work in eval() mode? I looked up the source code, but this points to F.batch_norm, but this points to torch.batch_norm (which has no file I can look up).

Fundamentally, I want do something very simple.

Suppose, I have input x, which is passed into nn.BatchNorm1d(x) to get y1 during eval() mode. I want to recreate y by using the parameters (gamma, bias, running_mean, running_var - where gamma and bias are stored inside the module’s parameters variable) that are inside nn.BatchNorm1d via the following:

y2 = gamma * (x - running_mean) / (sqrt(running_var)) + bias

However, y1 != y2. Am I missing something?

Thank you!

How large is the difference between y1 and y2?
If it’s approx. 1e-6 or smaller, you might just see the limited floating point precision of FP32 numbers.
To compare these tensors, you could alternatively use torch.allclose and specify the absolute or relative error, if necessary.

I’ve created a manual approach of nn.BatchNorm2d some time ago here. Maybe it might help in debugging. :wink:

1 Like

Thank you so much! :slight_smile: Will check the differences exactly, but I did notice one thing.

While the differences were very small, the total accuracy as a result of recreating batchnorm was far worse (i.e. the actual predicted answers from the network were totally different than the manual predictions I found after recreating batchnorm).

Could you rerun the code using a few seeds and check, if you get these bad results randomly using both approaches?
If the error of the manual vs. PyTorch batch norm output is small, e.g. an unlucky initialization might yield these results.

Sorry for the delay in my response. The results were bad until I normalized the batchnorm weights.

After this, I actually noticed that manually forward propping through the weights of my layers/batchnorm (since it already has normalized running_mean and running_var variables) yielded a higher accuracy than auto forward propping through the whole net.

Thanks for your help @ptrblck. Typically, I’ve only used batchnorm2d, and glorot initializations can be applied here, so didn’t think to apply any kind initialize on batchnorm1d.

1 Like