RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior

Simon_fei · April 26, 2019, 9:29am

def calculate_gradient_penalty(self, real_sens, fake_sens):
        eta = torch.FloatTensor(opt.BATCH_SIZE, opt.MAX_SEQ_LEN).uniform_(0,1)

        if self.cuda:
            eta = eta.cuda()
        # CUDALongTensor--->LongTensor--->FloatTensor--->CUDAFloatTensor
        new_real_sens = real_sens.cpu().type(torch.FloatTensor).cuda()
        new_fake_sens = fake_sens.cpu().type(torch.FloatTensor).cuda()
        interpolated = eta * new_real_sens + ((1 - eta) * new_fake_sens) # CUDAFloatTensor
        # calculate probability of interpolated examples
        prob_interpolated = self.D(interpolated.cpu().type(torch.LongTensor).cuda()) # batch_size, 1 (LongTensor)
        # define it to calculate gradient
        interpolated = Variable(interpolated, requires_grad = True)
        # calculate gradients of probabilities with respect to examples
        gradients = torch.autograd.grad(outputs=prob_interpolated, inputs=interpolated,
                                        grad_outputs=torch.ones(prob_interpolated.size()).cuda(),
                                        create_graph=True, retain_graph=True)
        grad_penalty = ((gradients.norm(2, dim=1)-1) ** 2).mean() * self.lambda_term
        return grad_penalty

When I write the calculate_gradient_penalty function of WGAN-GP, but I got the following problem:
“RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.”
I search the similar problem that has exists in other man’s work, the cause is that they use ‘.detach()’ or ‘.data’ or ‘.numpy()’ somewhere in the chain. After investigation, I don’t have this problem. So I want to know what the causes of problem I am facing?

kshitij · April 26, 2019, 12:31pm

You should probably try what the Error says.

gradients = torch.autograd.grad(outputs=prob_interpolated, inputs=interpolated,
                                        grad_outputs=torch.ones(prob_interpolated.size()).cuda(),
                                        create_graph=True, retain_graph=True, allow_unused=True)

From documentation
allow_unused (bool, optional) – If False, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults to False.

https://pytorch.org/docs/stable/autograd.html#torch.autograd.grad

Simon_fei · April 27, 2019, 3:27am

Thanks for your answer. This method I has tried and I failed before I put forward my topic. When I set allow_unused = True as the error says, the gradient I get is None, so it is also confused me.

kshitij · April 29, 2019, 4:43am

Oh, so if I am correct, you’ll get None for the unused input, for the ones that are used, you’ll probably get some gradients.

Ramkumar_Sakthivel · June 27, 2020, 2:20pm

I am also facing the same issue and I get none for all the gradients. Did you solve this issue ? @Simon_fei

Sahar_Nasser · October 7, 2020, 5:11am

I am facing same issue. Did you solve it?

zhaizhaizhai · December 18, 2020, 8:24am

I am facing same issue. Did you solve this issue ?

arhouati · February 17, 2021, 4:31pm

How did you check this ? there is any tools or methods to verify computational graph ?
I have the same problem, but it’s not easy for me to check since my model is little complicated.

Brando_Miranda · February 25, 2022, 7:40pm

@ptrblck @albanD apologies for the direct ping. Unsure who to ping.

I am facing this issue too but I know this should not be happening. All weights should be being used.

Is it possible to improve the error message and display the name of the weights causing this issue? It would help me to even start where to debug this.

My model is a simple 5CNN and it passes mini-imagenet data. There shouldn’t be this issue – but would love to get better messages

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

related: Error from maml, gradients var not found -- why? · Issue #314 · learnables/learn2learn · GitHub

Brando_Miranda · March 2, 2022, 10:29pm

here is the gitissue for this

github.com/pytorch/pytorch

Improving error message RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

opened 10:28PM - 02 Mar 22 UTC

brando90

### 🐛 Describe the bug I get this error: ``` RuntimeError: One of the differe…ntiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. ``` but it doesn't tell me the name of the tensors throwing that issue. Can be displayed in the error to make easier to debug? ### Versions ``` (meta_learning) brandomiranda~ ❯ python collect_env.py Collecting environment information... PyTorch version: 1.9.1 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 12.2.1 (x86_64) GCC version: Could not collect Clang version: 13.0.0 (clang-1300.0.29.30) CMake version: Could not collect Libc version: N/A Python version: 3.9.7 (default, Sep 16 2021, 08:50:36) [Clang 10.0.0 ] (64-bit runtime) Python platform: macOS-10.16-x86_64-i386-64bit Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] torch==1.9.1 [pip3] torchaudio==0.9.1 [pip3] torchmeta==1.8.0 [pip3] torchtext==0.10.1 [pip3] torchvision==0.10.1 [conda] numpy 1.21.4 pypi_0 pypi [conda] torch 1.9.1 pypi_0 pypi [conda] torchaudio 0.9.1 pypi_0 pypi [conda] torchmeta 1.8.0 pypi_0 pypi [conda] torchtext 0.10.1 pypi_0 pypi [conda] torchvision 0.10.1 pypi_0 pypi ```

albanD · March 3, 2022, 6:01pm

Hi,

You can’t print which error is problematic directly no.

But you can pass the “allow_unused=True” flag to remove the error and then check which outputs are “None”. These are the ones that were not connected to the graph.

Brando_Miranda · March 3, 2022, 6:34pm

Hi!

it should be trivial from the pytorch dev side. If you PyTorch already knows some params are NOT in the forward pass since it failed to do backprop – then instead of giving me an existential quantifier tell me which one concretely is not being used. Then I can debug it. Otherwise, where do I even start in a large model?

Correct me if Im wrong

Brando_Miranda · March 3, 2022, 6:33pm

Hi!

it should be trivial from the pytorch dev side. If you PyTorch already knows some params are NOT in the forward pass since it failed to do backprop – then instead of giving me an existential quantifier tell me which one concretely is not being used. Then I can debug it. Otherwise, where do I even start in a large model?

Correct me if I am wrong

albanD · March 3, 2022, 6:36pm

Lets discuss in the issue to avoid duplicate!

zhiyuanpeng · March 14, 2022, 1:37am

Thans for your reply. The doc of allow_unused should be updated. It is confused about the consequence of setting allow_unused true.

Ts_M · March 21, 2022, 10:00am

Hi, I encountered a similar problem when I reproduced the gradient penalty of WGAN-gp. It appears that changing the view/shape of the tensor, say X, that is passed to inputs arguments causes this problem. Don’t know if this is true for the original post and whether this is desired behavior. This can be replicated by following codes:

import torch
from torch.autograd import grad
a=torch.randn((3,4),requires_grad=True)
b=a @ (torch.arange(4).float()+1).reshape(4,1)
c=b.sum()
gs=grad(c,a.view(-1),torch.ones_like(c),True,True)[0]
# ^ will fail with RuntimeError:...Tensors appears to not have been used in the graph. Set allow_unused=True...
gs=grad(c,a,torch.ones_like(c),True,True)[0] # success

The PyTorch version that I used is 1.10.2.

ptrblck · March 22, 2022, 6:08am

I think the failure in your code is expected as you are passing a view of a (with a valid .grad_fn) as the input to torch.autograd.grad. a.view(-1) is thus also not a leaf tensor anymore.

Ts_M · March 22, 2022, 7:05am

Sounds Reasonable! Thank you for the reply.

Yeshwanth_V · April 4, 2022, 4:17am

I am facing the same issue. Did anyone find a workaround for this?

Brando_Miranda · October 22, 2022, 8:19pm

@ptrblck @albanD how do I set allow_unused=True globally? I just have a normal .backward() so I never actually use the torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False, is_grads_batched=False) directly myself as the docs suggest.

How do I set this true? (hoping this will remove the fact that it’s likely that the params for my vit/transformer aren’t being used when I do a forward pass and that’s fine)