I’ve found that in neural network, I’m coming across non-Nan losses with NaN grads. I’m using Adam with default parameters. Nothing fancy in my network. Has anyone come across such an issue? I’ve always known that NaN losses cause NaN gradients. But this is a bit odd.
Many operations could give you
NaN in the backward even with non-
NaN values in the forward. For example, sqrt at
You will need to find where
NaN appear in the backward to be sure.
If you’re using master, you can use anomaly detection to get that information.
I don’t seem to have it. What is master, anyway?
Also, while debugging, I’ve noticed gradients of the order of 1e-19. Could that be a problem?
What I call
master is the version of pytorch from the current master branch on github. Not the releases.
It depends what operations you are doing, 0/0 will give you NaN for example.
I’ve almost always had this problem as the result of takign the square root of 0.
.std() implicitly takes the square root.
whooo, anomaly detection looks interesting
Is there any known issue with softmax and small values?
If my pytorch version is 0.3, is there any method to check where causes the gradient NAN?Or, could you please give some examples?
To check the forward pass, you will need to add prints.
To check the backward pass you will need to add hooks and prints.
Unfortunately there is no easier way than doing it by hand in older versions.
Hi~ Is it possible that loss is NaN while the gradients are not?
No, that should not be possible since the NaN loss value would be backpropagated and would create invalid gradients throughout the model. At least I wouldn’t know which operation can “recover” the gradient again and how it would work.
Get it. Thanks for your patience!