RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior

def calculate_gradient_penalty(self, real_sens, fake_sens):
        eta = torch.FloatTensor(opt.BATCH_SIZE, opt.MAX_SEQ_LEN).uniform_(0,1)

        if self.cuda:
            eta = eta.cuda()
        # CUDALongTensor--->LongTensor--->FloatTensor--->CUDAFloatTensor
        new_real_sens = real_sens.cpu().type(torch.FloatTensor).cuda()
        new_fake_sens = fake_sens.cpu().type(torch.FloatTensor).cuda()
        interpolated = eta * new_real_sens + ((1 - eta) * new_fake_sens) # CUDAFloatTensor
        # calculate probability of interpolated examples
        prob_interpolated = self.D(interpolated.cpu().type(torch.LongTensor).cuda()) # batch_size, 1 (LongTensor)
        # define it to calculate gradient
        interpolated = Variable(interpolated, requires_grad = True)
        # calculate gradients of probabilities with respect to examples
        gradients = torch.autograd.grad(outputs=prob_interpolated, inputs=interpolated,
                                        grad_outputs=torch.ones(prob_interpolated.size()).cuda(),
                                        create_graph=True, retain_graph=True)
        grad_penalty = ((gradients.norm(2, dim=1)-1) ** 2).mean() * self.lambda_term
        return grad_penalty

When I write the calculate_gradient_penalty function of WGAN-GP, but I got the following problem:
“RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.”
I search the similar problem that has exists in other man’s work, the cause is that they use ‘.detach()’ or ‘.data’ or ‘.numpy()’ somewhere in the chain. After investigation, I don’t have this problem. So I want to know what the causes of problem I am facing?

2 Likes

You should probably try what the Error says.

gradients = torch.autograd.grad(outputs=prob_interpolated, inputs=interpolated,
                                        grad_outputs=torch.ones(prob_interpolated.size()).cuda(),
                                        create_graph=True, retain_graph=True, allow_unused=True) 

From documentation
allow_unused (bool, optional) – If False, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults to False.

https://pytorch.org/docs/stable/autograd.html#torch.autograd.grad

Thanks for your answer. This method I has tried and I failed before I put forward my topic. When I set allow_unused = True as the error says, the gradient I get is None, so it is also confused me.

2 Likes

Oh, so if I am correct, you’ll get None for the unused input, for the ones that are used, you’ll probably get some gradients.

I am also facing the same issue and I get none for all the gradients. Did you solve this issue ? @Simon_fei

4 Likes

I am facing same issue. Did you solve it?

3 Likes

I am facing same issue. Did you solve this issue ?

2 Likes

How did you check this ? there is any tools or methods to verify computational graph ?
I have the same problem, but it’s not easy for me to check since my model is little complicated.

@ptrblck @albanD apologies for the direct ping. Unsure who to ping.

I am facing this issue too but I know this should not be happening. All weights should be being used.

Is it possible to improve the error message and display the name of the weights causing this issue? It would help me to even start where to debug this.

My model is a simple 5CNN and it passes mini-imagenet data. There shouldn’t be this issue – but would love to get better messages :slight_smile:

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

related: Error from maml, gradients var not found -- why? · Issue #314 · learnables/learn2learn · GitHub

here is the gitissue for this

Hi,

You can’t print which error is problematic directly no.

But you can pass the “allow_unused=True” flag to remove the error and then check which outputs are “None”. These are the ones that were not connected to the graph.

1 Like

Hi!

it should be trivial from the pytorch dev side. If you PyTorch already knows some params are NOT in the forward pass since it failed to do backprop – then instead of giving me an existential quantifier tell me which one concretely is not being used. Then I can debug it. Otherwise, where do I even start in a large model?

Correct me if Im wrong :slight_smile:

Hi!

it should be trivial from the pytorch dev side. If you PyTorch already knows some params are NOT in the forward pass since it failed to do backprop – then instead of giving me an existential quantifier tell me which one concretely is not being used. Then I can debug it. Otherwise, where do I even start in a large model?

Correct me if I am wrong :slight_smile:

Lets discuss in the issue to avoid duplicate!

Thans for your reply. The doc of allow_unused should be updated. It is confused about the consequence of setting allow_unused true.

Hi, I encountered a similar problem when I reproduced the gradient penalty of WGAN-gp. It appears that changing the view/shape of the tensor, say X, that is passed to inputs arguments causes this problem. Don’t know if this is true for the original post and whether this is desired behavior. This can be replicated by following codes:

import torch
from torch.autograd import grad
a=torch.randn((3,4),requires_grad=True)
b=a @ (torch.arange(4).float()+1).reshape(4,1)
c=b.sum()
gs=grad(c,a.view(-1),torch.ones_like(c),True,True)[0]
# ^ will fail with RuntimeError:...Tensors appears to not have been used in the graph. Set allow_unused=True...
gs=grad(c,a,torch.ones_like(c),True,True)[0] # success

The PyTorch version that I used is 1.10.2.

I think the failure in your code is expected as you are passing a view of a (with a valid .grad_fn) as the input to torch.autograd.grad. a.view(-1) is thus also not a leaf tensor anymore.

2 Likes

Sounds Reasonable! Thank you for the reply.

I am facing the same issue. Did anyone find a workaround for this?

1 Like

@ptrblck @albanD how do I set allow_unused=True globally? I just have a normal .backward() so I never actually use the torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False, is_grads_batched=False) directly myself as the docs suggest.

How do I set this true? (hoping this will remove the fact that it’s likely that the params for my vit/transformer aren’t being used when I do a forward pass and that’s fine)