Confusion about backward of clamp operation

Mirocos · January 8, 2023, 9:44am

    torch::Tensor t = torch::tensor(0.0 , torch::kFloat32);
    t.requires_grad_(true);
    std::vector<float> data{ -0.2, 0.3, 0.4, -0.1, -0.2, -0.2 };
    torch::Tensor a = torch::from_blob(data.data(), { 2, 3 }, torch::kFloat32);
    a.requires_grad_(true);
    auto mr = torch::clamp(a, t);
    mr.backward(torch::ones_like(mr));

    LOG(INFO) << t.grad();
    LOG(INFO) << a.grad();

I what to figure out the gradient of second parameter in clamp function.
And I have above code, dmrdt equals to 4 which looks like its gradient equals to the number of element that less than t(0.0).

I didn’t not found any souce code of torch::clamp in LibTorch, but only thing I cound found is something in Python style. It seems the blow code do not return the gradient of second parameter or third parameter.

class Clamp(Function):

    @staticmethod
    def forward(ctx, i, min_val, max_val):
        ctx._mask = (i.ge(min_val) * i.le(max_val))
        return i.clamp(min_val, max_val)

    @staticmethod
    def backward(ctx, grad_output):
        mask = Variable(ctx._mask.type_as(grad_output.data))
        return grad_output * mask, None, None

So what confused me is that how LibTorch produce the gradient of dmrdt?

ptrblck · January 8, 2023, 9:30pm

I think the derivative is defined here and this comment might be relevant:

For clamp, gradient is not defined at the boundaries. But empirically it’s helpful to be able to get gradient on min and max, so we return the subgradient 1 for these cases.