How do I calculate the gradients of a non-leaf variable w.r.t to a loss function?

Abhai_Kollara · July 22, 2017, 12:32pm

If my understanding is correct, calling the .backward() of a Variable only generates the gradients of the leaf nodes. Is there any way to calculate the gradients w.r.t to an arbitrary Variable, sort of like torch.gradients(loss, variables)

tom · July 22, 2017, 2:49pm

Hello @Abhai_Kollara

Update:
The proper solution is to use .retain_grad()

v = torch.autograd.Variable(torch.randn(3), requires_grad=True)
v2 = v+1
v2.retain_grad()
v2.sum().backward()
v2.grad

Apparently, this is common enough.

This is what I had posted before I knew better:

how about using hooks, e.g.

v = torch.autograd.Variable(torch.randn(3), requires_grad=True)
def require_nonleaf_grad(v):
     def hook(g):
         v.grad_nonleaf = g
     v.register_hook(hook)
v2 = v+1
require_nonleaf_grad(v2)
v2.sum().backward()
v2.grad_nonleaf

I don’t recommend calling it .grad to not collide with pytorch internals.

Best regards

Thomas

amirid · July 31, 2018, 2:51pm

Hello @tom

I want to get the gradients of the discriminator network w.r.t the fake variable generated by the generator net. I tried both of your solutions, by using the first way I get None for the gradient value and by the second solution I get the following error:
AttributeError: ‘Tensor’ object has no attribute ‘grad_nonleaf’.

Do you have any idea of what’s wrong?

tom · July 31, 2018, 7:53pm

Most likely, your generator output doesn’t require gradients, maybe put it in eval or set requires_grad=False on its parameters.

Best regards

Thomas

JoeHEZHAO · May 14, 2019, 7:51pm

Dear Thomas

Thanks for your posting. I am also encountering the ‘optimize non-leaf variable’ issue and I am grategul if you can provide some feedback to my practice.

What I want to do is using a linear combination (learnable) of basis filters for convolution operation and optimize only the linear coefficients as below:

class net(torch.nn.Module):
    def __init__():
        fitler_basis = [f_height, f_weight, num_filter];
        self.coeff = [num_filter, c_in, c_out]; # To be optimized

        self.conv2d = torch.nn.conv2d(c_in, c_out, strides=1, padding=1)
        self.conv2d.weight = torch.matmul(filter_basis, coeff) # linear combinaton of pre-fixed filter as weights;

    def forward(self, data):
        return self.conv2d(data)

    def train(self):
        loss = ...
        optim(self.coeff, loss)

How should I achieve my goal ?

Deep_Patel · April 11, 2020, 6:18am

Perfect solution for getting gradients of non-leaf variables in autograd

TriKri · July 28, 2022, 7:56pm

Hi @tom,

Thanks for the require_nonleaf_grad(my_tensor) solution!

However, when I use it, the memory on my GPU quickly fills up, and even if I delete my_tensor.grad_nonleaf imediatelly after having called loss.backward, the gradient tensor is still preserved in exactly 50% of the cases. I know this because I store my_tensor.storage().data_ptr() in the set addresses, and each 500th training batch I use Python’s built-in garbage collector to find existing tensors and count the number of tensors t for which t.storage().data_ptr() in addresses, and this number increases with 250 each time I do so.

Do you know how that can be? I don’t even use my_tensor.grad_nonleaf for anything yet, I just request that it be created and then delete is as soon as I get the opportunity. How can I debug this most easily?

tom · July 29, 2022, 7:15am

Note that the proper solution is to use retain_grad.
It’s really easy to create circular dependencies with gradients and Tensors, so maybe you can set the gradient to None explicitly.
Other than that, that it happens every second time might point to something else being up with your code, too. It’s hard to tell.

Best regards

Thomas

TriKri · August 1, 2022, 1:59pm

Thanks for your answer! I changed from using require_nonleaf_grad to using retain_grad and now the problem seems to be gone.