Why are gradients not zero at global minimum?

I made a function to compute the IoU loss of 2 spans

def iou_loss(pred_start, pred_end, target_start, target_end):
    """
    pred_start, pred_end, target_start, target_end: b, *
    """

    num_common = (
        torch.minimum(pred_end, target_end)
        - torch.maximum(pred_start, target_start) + 1
    ).clamp_min_(0)

    num_pred = (pred_end - pred_start + 1).clamp_min_(0)
    num_target = target_end - target_start + 1

    iou = num_common / (num_pred + num_target - num_common)

    return 1 - iou

However the gradients are not zero when the loss is at its global minimum.

pred_start = torch.tensor([5, 2], dtype=torch.float, requires_grad=True)
pred_end = torch.tensor([5, 4], dtype=torch.float, requires_grad=True)
target_start = torch.tensor([5, 2])
target_end = torch.tensor([5, 4])

l = iou_loss(pred_start, pred_end, target_start, target_end)
print(l)
# tensor([0., 0.], grad_fn=<RsubBackward1>)

l = l.mean()
l.backward()

print(pred_start.grad)
# tensor([-0.5000, -0.1667])

print(pred_end.grad)
# tensor([0.5000, 0.1667])

Can anyone help me to understand why? Any help would be appreciated.

Hi jjhh!

The short story is that torch.minimum() and torch.maximum() are not
differentiable at the special points where the arguments are equal.

The function abs (x) offers a simpler example. When x > 0, the derivative
is +1, while when x < 0, it’s -1. When x = 0, mathematically speaking,
the derivative is not defined. (One could try to play some game where you
define the derivative to be 0 when x = 0 – and pytorch does do this for the
abs() function – but doing so is problematic, and not really worth the bother.)

Here is an example script that probes your loss function with values where
the arguments are close, but not exactly equal:

import torch
print (torch.__version__)

def iou_loss(pred_start, pred_end, target_start, target_end):
    """
    pred_start, pred_end, target_start, target_end: b, *
    """

    num_common = (
        torch.minimum(pred_end, target_end)
        - torch.maximum(pred_start, target_start) + 1
    ).clamp_min_(0)

    num_pred = (pred_end - pred_start + 1).clamp_min_(0)
    num_target = target_end - target_start + 1

    iou = num_common / (num_pred + num_target - num_common)

    return 1 - iou


pred_start = torch.tensor([5, 2], dtype=torch.float, requires_grad=True)
pred_end = torch.tensor([5, 4], dtype=torch.float, requires_grad=True)
target_start = torch.tensor([5, 2])
target_end = torch.tensor([5, 4])


delta = torch.tensor ([1.e-5, 0.0])

iou_loss(pred_start, pred_end, target_start, target_end).mean().backward()
print ('no delta:    pred_start.grad =', pred_start.grad)

pred_start.grad *= 0.0
pred_end.grad *= 0.0
iou_loss(pred_start + delta, pred_end, target_start, target_end).mean().backward()
print ('plus delta:  pred_start.grad =', pred_start.grad)

pred_start.grad *= 0.0
pred_end.grad *= 0.0
iou_loss(pred_start - delta, pred_end, target_start, target_end).mean().backward()
print ('minus delta: pred_start.grad =', pred_start.grad)

Here is its output:

1.7.1
no delta:    pred_start.grad = tensor([-0.5000, -0.1667])
plus delta:  pred_start.grad = tensor([ 0.5000, -0.1667])
minus delta: pred_start.grad = tensor([-0.5000, -0.1667])

You can see the gradient jump as you cross the equality boundary (just
like with the abs (x) example).

Best.

K. Frank

2 Likes

Hi K.Frank,

Thank you for the explanation.

def iou_loss(pred_start, pred_end, target_start, target_end):
    """
    pred_start, pred_end, target_start, target_end: b, *
    """

    num_common = (pred_end - pred_start + 1).clamp_min_(0)

    num_pred = (pred_end - pred_start + 1).clamp_min_(0)
    num_target = target_end - target_start + 1

    iou = num_common / (num_pred + num_target - num_common)

    return 1 - iou

I tested the function without minimum and maximum, but i still got the same gradients.

Hi jjhh!

First, I do not get the same gradients using your new version of iou_loss().

Second, while your new iou_loss() still returns zero for the input parameters
given in your first post, this is no longer the minimum as iou_loss() can now
become negative.

If you believe otherwise please post a complete, runnable script, together with
its output, that demonstrates your result.

Best.

K. Frank

Hi K.Frank, you are right. the loss becomes 1 - num_pred / num_target now and it can be negative. The signs of the gradients switched, too. Sorry it was a stupid question. Thank you for your help.