For some reason the output is getting NaN values.
Could you break
the training loop, after you’ve encountered the first NaN and check all parameters of the model?
E.g. you could print their abs().max()
via:
for name, param in model.named_parameters():
print(name, param.abs().max())
If this looks alright, you could repeat the last forward iteration (since the input contains valid values) and check all intermediate activations to narrow down, which layer creates the NaN outputs using forward hooks as described here.