RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

None of them seem to have a grad_fn.
Did you create these tensors with requires_grad=True?

Yes i created those tensors and gave them requires_grad = True

That wonā€™t work and would fit the 3rd point I mentioned.
You would need to create the computation graph with differentiable operations, which will create a result tensor with a valid grad_fn.

How to do that exaclty ?

Here is a simple code snippet:

x = torch.randn(1, 1, requires_grad=True)
lin = nn.Linear(1, 1) # your model or manual operations
out = lin(x)
print(out.grad_fn)
out.backward()

Your model might of course be more complicated.

2 Likes

okay , but my GAN Model is working with BCELoss but when I trying to use my own loss Function ie (WGANLOSS) it is not working only.

All the inbuilt loss functions work but not my own WGANLOSS function that i have created.

the loss function i am using is : total_loss = (torch.mean(h_fake) - torch.mean(h_real))

Could you check the grad_fn attributes of h_fake, h_real, and total_loss?
If thatā€™s the complete loss calculation, it should not detach anything from the computation graph.

1 Like

this worked for me in March2020

This just happened for me,

What i did wrong was
Instead of passing

logits = model(images) in loss=criterion(logits, labels)

I was passing directly loss=criterion(images, labels)

1 Like

In my case, the code that was supposed to compute some gradients was wrapped in:

with torch.no_grad():
    # some code

Removing the expression did the trick.

5 Likes

Yep
I wrote it foreahead:

with torch.no_grad():
2 Likes

Hi
I have same issue, but my network is not really a simple neural net. I mean, in my case the output of network has been passed to other functions and the loss has been calculated with respect to output of these functions. I have implemented these functions and I am using MSELoss. but I get this error.
can any one help me?

In my case, too. I spend several hours to fix this issue and very glad that your snippet made my day!.

1 Like

HI Ptrblck,

I want to use smooth labeling with the criterion=nn.CrossEntropyLoss() with batch size of 64. The labels are random number between 0.8 to 0.9 and the outputs are from Sigmoid. The code is

    b_size=64
    label=(0.9-0.8)* torch.rand(b_size) + 0.8
    label=label.to(device).type(torch.LongTensor)

    # Forward pass real batch through D
    
    netD=netD.float()
    output = netD(real_cpu).view(-1)
    # Calculate loss on all-real batch
    output1=torch.zeros(64,64)
    for ii in range(64):
        output1[:,ii]=ii
    for ii in range(64):
        output1[ii,:]= output[ii].type(torch.LongTensor)
        
    errD_real = criterion(output1, label)

and the error is:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

By applying (torch.LongTensor) all the labels and output become 0!

You wonā€™t be able to convert your output, which is a floating point tensor to a LongTensor without detaching it from the computation graph, since gradient calculations are only implemented for floating point values.

If you want to apply label smoothing, I would recommend to have a look at this post.

Many thanks for your help.
My case indeed is a Binary classification with 0 1and 1 label. But, I want to use smooth labeling instead of 1 having random number from 0.8 to 0.9 and instead of 0 have labels between 0.01 to 0.3.

How I can used the posted code as you suggested? I have 64 labels i think the number of class will be 64 for me because labels are completely different ? Or would u recommend any other loss function which accept float target? Or any cross entropy code which accept float target ?
if I remove the .type(torch.LongTensor) it gave me error that expected long but get float .

Iā€™m not how, how your use case is a binary classification, if you are dealing with a number of classes of 64?

Anyway, I just tried the linked code and itā€™s working. Are you seeing any issues with it?

criterion = LabelSmoothingLoss(64, smoothing=0.1)

x = torch.randn(10, 64, requires_grad=True)
y = torch.randint(0, 64, (10,))

loss = criterion(x, y)
1 Like

Hey,
Iā€™m having a similar problem with this RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn error.
Iā€™m doing some facial analysis with this network. I have a bit of experience working in pytorch, but Iā€™m currently out of my depth. Since Iā€™m doing this as part of a larger project, I decided to wrap the FAN in a Pytorch Lightning module which, as far as I know, shouldnā€™t affect how freezing params works. I mention it in case there is a problem with wrapping a network like this:

class LightningFAN(LightningModule):
  def __init__(self):
    LightningModule.__init__(self)
    self._fan = FAN(num_modules=4)
    ...

  def train_freeze(self):
    for i, param in enumerate(self._fan.parameters()):
      param.requires_grad = False
    trainable = [self._fan.l3, self._fan.m3, self._fan.top_m_3, self._fan.conv_last3, self._fan.bn_end3]
    for module in trainable:
      for param in module.parameters():
        param.requires_grad = True

  def forward(self, x):
    return self._fan.forward(x)

  ...
  def other_lightning_module_methods():
    pass
  ...

  def configure_optimizers(self):
    optimizer = RMSprop(self.parameters(), lr=self.hparams.learning_rate, weight_decay=0.0)
    return optimizer

When I call the train method on this, it errors. If I donā€™t freeze anything it works fine.
As a test, I started unfreezing layers until it trained correctly. I wrote a quick function to see which modules were frozen without problems

for key, val in self._fan.__dict__["_modules"].items():
  print("Key: ", key)
  try:
    print("Weight Requires Grad: ", val.weight.requires_grad)
  except Exception:
    pass
  try:
    print("Weight Grad Function: ", val.weight.grad_fn, val.weight.grad)
  except Exception:
    pass
  try:
    print("Bias Requires Grad: ", val.bias.requires_grad)
  except Exception:
    pass
  try:
    print("Bias Grad Function: ", val.bias.grad_fn, val.bias.grad)
  except Exception:
    pass
  print()

And as an output I get

Key:  conv1
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  bn1
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  conv2

Key:  conv3

Key:  conv4

Key:  m0

Key:  top_m_0

Key:  conv_last0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  bn_end0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  l0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl0
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al0
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m1

Key:  top_m_1

Key:  conv_last1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m2

Key:  top_m_2

Key:  conv_last2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m3

Key:  top_m_3

Key:  conv_last3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

The problem actually occurs when I freeze the gradient for the l0 moduleā€™s bias. Why would that be? Does this mean that none of the layers after that one are computing gradients? Also, why do no layers have gradient functions at this stage?

The .grad_fn of activations will be populated during the forward pass by default, not of the parameters.
Is your training working fine without using Lightning? If not, could you post an executable code snippet, so that we could have a look?

I am having serious issue with this error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I think the biggest problem is that it doesnā€™t tell me which tensor is causing this error. I check and re check and everything seems right. But I canā€™t seem to find who is causing this. My model seems fine.

Can we have a more informative error message here please? Perhaps once itā€™s caught it can print the tensor or something to identify who is the culprit?


Addendum:

My error was caused by a torch.no_grad(). I used it but I was doing meta-learning (evaluation/testing phase) so grads are needed for things like MAMLā€¦new territory for the bug explored!

regardless, a better error msg would be great.

3 Likes