If my understanding is correct, calling the .backward()
of a Variable
only generates the gradients of the leaf nodes. Is there any way to calculate the gradients w.r.t to an arbitrary Variable, sort of like torch.gradients(loss, variables)
Hello @Abhai_Kollara
Update:
The proper solution is to use .retain_grad()
v = torch.autograd.Variable(torch.randn(3), requires_grad=True)
v2 = v+1
v2.retain_grad()
v2.sum().backward()
v2.grad
Apparently, this is common enough.
This is what I had posted before I knew better:
how about using hooks, e.g.
v = torch.autograd.Variable(torch.randn(3), requires_grad=True)
def require_nonleaf_grad(v):
def hook(g):
v.grad_nonleaf = g
v.register_hook(hook)
v2 = v+1
require_nonleaf_grad(v2)
v2.sum().backward()
v2.grad_nonleaf
I don’t recommend calling it .grad
to not collide with pytorch internals.
Best regards
Thomas
Hello @tom
I want to get the gradients of the discriminator network w.r.t the fake variable generated by the generator net. I tried both of your solutions, by using the first way I get None for the gradient value and by the second solution I get the following error:
AttributeError: ‘Tensor’ object has no attribute ‘grad_nonleaf’.
Do you have any idea of what’s wrong?
Most likely, your generator output doesn’t require gradients, maybe put it in eval or set requires_grad=False on its parameters.
Best regards
Thomas
Dear Thomas
Thanks for your posting. I am also encountering the ‘optimize non-leaf variable’ issue and I am grategul if you can provide some feedback to my practice.
What I want to do is using a linear combination (learnable) of basis filters for convolution operation and optimize only the linear coefficients as below:
class net(torch.nn.Module):
def __init__():
fitler_basis = [f_height, f_weight, num_filter];
self.coeff = [num_filter, c_in, c_out]; # To be optimized
self.conv2d = torch.nn.conv2d(c_in, c_out, strides=1, padding=1)
self.conv2d.weight = torch.matmul(filter_basis, coeff) # linear combinaton of pre-fixed filter as weights;
def forward(self, data):
return self.conv2d(data)
def train(self):
loss = ...
optim(self.coeff, loss)
How should I achieve my goal ?
Perfect solution for getting gradients of non-leaf variables in autograd
Hi @tom,
Thanks for the require_nonleaf_grad(my_tensor)
solution!
However, when I use it, the memory on my GPU quickly fills up, and even if I delete my_tensor.grad_nonleaf
imediatelly after having called loss.backward
, the gradient tensor is still preserved in exactly 50% of the cases. I know this because I store my_tensor.storage().data_ptr()
in the set addresses
, and each 500th training batch I use Python’s built-in garbage collector to find existing tensors and count the number of tensors t
for which t.storage().data_ptr() in addresses
, and this number increases with 250 each time I do so.
Do you know how that can be? I don’t even use my_tensor.grad_nonleaf
for anything yet, I just request that it be created and then delete is as soon as I get the opportunity. How can I debug this most easily?
Note that the proper solution is to use retain_grad.
It’s really easy to create circular dependencies with gradients and Tensors, so maybe you can set the gradient to None explicitly.
Other than that, that it happens every second time might point to something else being up with your code, too. It’s hard to tell.
Best regards
Thomas
Thanks for your answer! I changed from using require_nonleaf_grad
to using retain_grad
and now the problem seems to be gone.