I think one issue is that simply adding an epsilon wouldn’t be able to guarantee there is no division by zero because there could be cancellation. You may want to take a look at the clamp function: torch.clamp — PyTorch 2.0 documentation if you want to ensure that sinAngle remains in a certain range.
@eqy, thanks. it is good to know this function. i tried angle range -90 to 90 . but still it gives nan. the problem is not the trigonometric range but the division. without division it works just fine. Why the loss is ok but its backpropagation for division gives nan?
I think you may have misunderstood my point. You can use clamp to ensure that the divisor does not becomes too small—though it may be simpler to do something like cosAngle/(signAngle + torch.sign(signAngle)*eps).
If the divisor becomes too small it can still cause numerical issues as the loss can cause the gradients of each layer to explode / vanish depending on the depth of the model.
E.g., a contrived example:
>>> a = torch.ones(1, requires_grad=True)
>>> loss = ((1/torch.sin((a/1e30))).sum())
>>> loss
tensor(1.0000e+30, grad_fn=<SumBackward0>)
>>> loss.backward()
>>> a
tensor([1.], requires_grad=True)
>>> a.grad
tensor([-inf])
>>>
It depends on the depth of the network, I would recommend inspecting the .grad fields of layers in your network and checking if there is any trend such as shallower layers having higher values.