Most people struggling nans at the output know that when you feed nan or zero input to the model which has layers such as batchnorm, then the model may become broken. During the training, if you do not do backward pass, the model does not break down. However, since you already did a forward pass with these undesired inputs once, the eval mode outputs nan while the train mode does not. Any explanation on this?
The running stats of batchnorm layers (
running_var) will be poisoned by the invalid values and will thus yield invalid outputs during
model.eval() forward passes.
Thanks for the reply. This was what I guessed. Apart from not feeding zero/nan inputs, are there any guard rails to use such as being able to automatically ignore such samples? Or a mechanism to avoiding updating the
I’m not aware of any other approaches that to make sure these invalid values are filtered out (e.g. via