I have the following loss function, for which I want to compute the gradients.

``````z = torch.tensor([1.], requires_grad=True, dtype=torch.float64)
thresh = 1e-10
alpha = 1e-5
beta=1.
tau = 1.
loss = F.softplus(tau- (z)/(margin+torch.sign(margin)*thresh), beta=beta)*torch.abs(margin+torch.sign(margin)*thresh)
``````

Mathematically, the gradient with respect to z is the same as:

``````manual_grad = F.sigmoid(beta*(tau- (z)/(margin+torch.sign(margin)*thresh)))*(-torch.sign(margin))
``````

Moreover, the gradient also exists for margin = 0. (exact 0). However, when I compute the gradient for the loss above, then it comes out to be NaN at 0.

Can anyone help with it?

Hi Gantavya!

At a technical level, neither `loss` nor its gradient is defined when
`margin = 0`.

Note that `torch.sign (torch.tensor ([0.0]))` is zero, so that the
argument to `softplus()` diverges (and `softplus()` itself either diverges
or becomes zero, depending on the sign of `z`)

You might hope that when `margin` is very small, but not quite zero, you
`margin` and then taking its limit as `margin` goes to zero. But, as shown
by your expression for `manual_grad`, this doesn’t work.

It is true that `manual_grad` evaluates to zero for `margin = 0.0`, but, in
isolation, this isn’t meaningful. When `margin` is very small and negative,
`manual_grad` becomes one (to machine precision), but when `margin` is
very small and positive, `manual_grad` underflows to zero. These numerical
computations are telling you that the gradient is discontinuous as a function
of `margin` when `margin` is equal to zero, so trying to define the gradient
by taking the limit as `margin` goes to zero leaves `margin` undefined.

Autograd reasonably gives you `nan` for this undefined value. (Simply
asserting that the gradient ought to be zero or constructing some expression
that returns zero doesn’t change that fact that the gradient isn’t well defined
when `margin = 0.0`.)

Best.

K. Frank