I am not posting the actual code since it is a lot and cannot be reduced that easy to the problem. But essentially I am having
output = net(input) which is a batchsize x 1 tensor.
I calculate the mean of output over the batchsize and my loss function is
loss = (mean - 10).pow(2).
So I am trying to have a mean of my network output = 10. First 2 iterations the loss goes down, in the third iteration it suddenly goes up to a million and then nan.
How can I debug such an issue (the definition of the loss was a simplification)?
I think problem is with your loss function. You should formulate it differently and add more constraint otherwise gradient gonna explode. or try abs(mean-10)
But why? What is the problem with it? (Btw. I did the same in TensorFlow, at least I hope it is the same, and received no error)
The problem might be, that your loss has no upper bound. So in case your
mean has high value, the resulting loss will be extremely large and possibly resulting in an
You might be able to alleviate this by limiting the loss.
Either by clipping the loss to a maximum value or by limiting the range of the
mean values by using a sigmoid prior to entering the loss calculation.
You are right, the loss is suddenly growing. But it starts properly:
The sequence is: 1600, 1550, 1520, 600000, 5e17, nan
I am wondering why it suddenly goes up like that.
If I clip the loss, wont my gradient be 0 and training collapse?
Sigmoid seems an interesting idea, I will try that, thanks.
Yes you are right, I meant clipping the gradient. This should avoid extreme steps during the optimization.
You might also try to lower your learning rate to see if the current learining rate is to high.
With much reduced learning rate it works out but too slow and convergence is not as good as it should be (comparing to TensorFlow version).
Ok I can try clipping the gradient, thanks.
I now tried Sigmoid and the result was that loss went down for many iterations (hundreds) and then again suddendly went up in shortest time (10 iterations) and gave nan again. I am more and more confused.
Thank you once more for you advice. The clipping did the trick in so far that it does converge. Value is not very good though.
Still I am puzzled why this gradient explodes so suddenly. But it’s good to control it