Most people struggling nans at the output know that when you feed nan or zero input to the model which has layers such as batchnorm, then the model may become broken. During the training, if you do not do backward pass, the model does not break down. However, since you already did a forward pass with these undesired inputs once, the eval mode outputs nan while the train mode does not. Any explanation on this?
The running stats of batchnorm layers (running_mean
and running_var
) will be poisoned by the invalid values and will thus yield invalid outputs during model.eval()
forward passes.
1 Like
Thanks for the reply. This was what I guessed. Apart from not feeding zero/nan inputs, are there any guard rails to use such as being able to automatically ignore such samples? Or a mechanism to avoiding updating the running_mean
or running_var
?
I’m not aware of any other approaches that to make sure these invalid values are filtered out (e.g. via torch.isfinite(input)
).
1 Like