BatchNorm returning NaN values

orbitalhybridization · March 5, 2024, 7:13am

I’m having an issue with the BatchNorm2d layer of my CNN, where the output ends up being all NaNs. I’ve narrowed this down to the fact that the variance of my previous layer (Conv2d) is 0, which causes a NaN in the norm calculation. I’ve directly calculated the variance of the Conv2d layer output so I don’t think it’s due to precision issues.

Things I’ve tried:

Changing the epsilon value (eps=1e-5). This caused the loss to explode and vary greatly over each batch.
Removing padding from Conv2d Tried this per ChatGPT advice. My mistake for even asking the thing.
Changing the filter size of Conv2d NaNs still output from BatchNorm layer.

All of this has not fixed the issue. I’ve seen this has been brought up before, but none of the fixes are working for me.

Any advice would be greatly appreciated!

Thanks,
-Orbital

JamesDickens · March 5, 2024, 9:34am

Are you using 16 bit precision?

JamesDickens · March 5, 2024, 9:41am

I would argue that the fact the variance is 0 is a problem by itself.