Backpropagation gives null due to trigonometric division

Hi

I am using a trigonometric values in my code, and after first iteration, the loss is nan. I figured out it is due to trigonometric term being divided.

            cosAngle=torch.cos(Chi)
            sinAngle=torch.sin(Chi)
            cosAngle=torch.unsqueeze(cosAngle,1)
            sinAngle=torch.unsqueeze(sinAngle,1)
            angle=(cosAngle)/(sinAngle)

If i remove sinAngle, it works. I tried torch.asin and also tried to add and multiply episol term, but no luck.

I greatly appreciate the response

regards
Sal

I think one issue is that simply adding an epsilon wouldn’t be able to guarantee there is no division by zero because there could be cancellation. You may want to take a look at the clamp function: torch.clamp — PyTorch 2.0 documentation if you want to ensure that sinAngle remains in a certain range.

@eqy, thanks. it is good to know this function. i tried angle range -90 to 90 . but still it gives nan. the problem is not the trigonometric range but the division. without division it works just fine. Why the loss is ok but its backpropagation for division gives nan?

I think you may have misunderstood my point. You can use clamp to ensure that the divisor does not becomes too small—though it may be simpler to do something like cosAngle/(signAngle + torch.sign(signAngle)*eps).

If the divisor becomes too small it can still cause numerical issues as the loss can cause the gradients of each layer to explode / vanish depending on the depth of the model.
E.g., a contrived example:

>>> a = torch.ones(1, requires_grad=True)
>>> loss = ((1/torch.sin((a/1e30))).sum())
>>> loss
tensor(1.0000e+30, grad_fn=<SumBackward0>)
>>> loss.backward()
>>> a
tensor([1.], requires_grad=True)
>>> a.grad
tensor([-inf])
>>>

Thanks. My loss before backpropagation is tensor(2.7658, grad_fn=).

Should it explode gradient?
thanks

It depends on the depth of the network, I would recommend inspecting the .grad fields of layers in your network and checking if there is any trend such as shallower layers having higher values.