# Why are gradients not zero at global minimum?

I made a function to compute the IoU loss of 2 spans

``````def iou_loss(pred_start, pred_end, target_start, target_end):
"""
pred_start, pred_end, target_start, target_end: b, *
"""

num_common = (
torch.minimum(pred_end, target_end)
- torch.maximum(pred_start, target_start) + 1
).clamp_min_(0)

num_pred = (pred_end - pred_start + 1).clamp_min_(0)
num_target = target_end - target_start + 1

iou = num_common / (num_pred + num_target - num_common)

return 1 - iou
``````

However the gradients are not zero when the loss is at its global minimum.

``````pred_start = torch.tensor([5, 2], dtype=torch.float, requires_grad=True)
pred_end = torch.tensor([5, 4], dtype=torch.float, requires_grad=True)
target_start = torch.tensor([5, 2])
target_end = torch.tensor([5, 4])

l = iou_loss(pred_start, pred_end, target_start, target_end)
print(l)

l = l.mean()
l.backward()

# tensor([-0.5000, -0.1667])

# tensor([0.5000, 0.1667])
``````

Can anyone help me to understand why? Any help would be appreciated.

Hi jjhh!

The short story is that `torch.minimum()` and `torch.maximum()` are not
differentiable at the special points where the arguments are equal.

The function `abs (x)` offers a simpler example. When `x > 0`, the derivative
is `+1`, while when `x < 0`, itβs `-1`. When `x = 0`, mathematically speaking,
the derivative is not defined. (One could try to play some game where you
define the derivative to be `0` when `x = 0` β and pytorch does do this for the
`abs()` function β but doing so is problematic, and not really worth the bother.)

Here is an example script that probes your loss function with values where
the arguments are close, but not exactly equal:

``````import torch
print (torch.__version__)

def iou_loss(pred_start, pred_end, target_start, target_end):
"""
pred_start, pred_end, target_start, target_end: b, *
"""

num_common = (
torch.minimum(pred_end, target_end)
- torch.maximum(pred_start, target_start) + 1
).clamp_min_(0)

num_pred = (pred_end - pred_start + 1).clamp_min_(0)
num_target = target_end - target_start + 1

iou = num_common / (num_pred + num_target - num_common)

return 1 - iou

pred_start = torch.tensor([5, 2], dtype=torch.float, requires_grad=True)
pred_end = torch.tensor([5, 4], dtype=torch.float, requires_grad=True)
target_start = torch.tensor([5, 2])
target_end = torch.tensor([5, 4])

delta = torch.tensor ([1.e-5, 0.0])

iou_loss(pred_start, pred_end, target_start, target_end).mean().backward()

iou_loss(pred_start + delta, pred_end, target_start, target_end).mean().backward()

iou_loss(pred_start - delta, pred_end, target_start, target_end).mean().backward()
``````

Here is its output:

``````1.7.1
no delta:    pred_start.grad = tensor([-0.5000, -0.1667])
plus delta:  pred_start.grad = tensor([ 0.5000, -0.1667])
minus delta: pred_start.grad = tensor([-0.5000, -0.1667])
``````

You can see the gradient jump as you cross the equality boundary (just
like with the `abs (x)` example).

Best.

K. Frank

2 Likes

Hi K.Frank,

Thank you for the explanation.

``````def iou_loss(pred_start, pred_end, target_start, target_end):
"""
pred_start, pred_end, target_start, target_end: b, *
"""

num_common = (pred_end - pred_start + 1).clamp_min_(0)

num_pred = (pred_end - pred_start + 1).clamp_min_(0)
num_target = target_end - target_start + 1

iou = num_common / (num_pred + num_target - num_common)

return 1 - iou
``````

I tested the function without `minimum` and `maximum`, but i still got the same gradients.

Hi jjhh!

First, I do not get the same gradients using your new version of `iou_loss()`.

Second, while your new `iou_loss()` still returns zero for the input parameters
given in your first post, this is no longer the minimum as `iou_loss()` can now
become negative.

If you believe otherwise please post a complete, runnable script, together with
its output, that demonstrates your result.

Best.

K. Frank

Hi K.Frank, you are right. the loss becomes `1 - num_pred / num_target` now and it can be negative. The signs of the gradients switched, too. Sorry it was a stupid question. Thank you for your help.