RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

I have printed both the Tensor y and DerivateX_g and put that in image for easy purpose.

None of them seem to have a grad_fn.
Did you create these tensors with requires_grad=True?

Yes i created those tensors and gave them requires_grad = True

That won’t work and would fit the 3rd point I mentioned.
You would need to create the computation graph with differentiable operations, which will create a result tensor with a valid grad_fn.

How to do that exaclty ?

Here is a simple code snippet:

x = torch.randn(1, 1, requires_grad=True)
lin = nn.Linear(1, 1) # your model or manual operations
out = lin(x)
print(out.grad_fn)
out.backward()

Your model might of course be more complicated.

1 Like

okay , but my GAN Model is working with BCELoss but when I trying to use my own loss Function ie (WGANLOSS) it is not working only.

All the inbuilt loss functions work but not my own WGANLOSS function that i have created.

the loss function i am using is : total_loss = (torch.mean(h_fake) - torch.mean(h_real))

Could you check the grad_fn attributes of h_fake, h_real, and total_loss?
If that’s the complete loss calculation, it should not detach anything from the computation graph.

this worked for me in March2020

This just happened for me,

What i did wrong was
Instead of passing

logits = model(images) in loss=criterion(logits, labels)

I was passing directly loss=criterion(images, labels)

In my case, the code that was supposed to compute some gradients was wrapped in:

with torch.no_grad():
    # some code

Removing the expression did the trick.

2 Likes

Yep
I wrote it foreahead:

with torch.no_grad():

Hi
I have same issue, but my network is not really a simple neural net. I mean, in my case the output of network has been passed to other functions and the loss has been calculated with respect to output of these functions. I have implemented these functions and I am using MSELoss. but I get this error.
can any one help me?

In my case, too. I spend several hours to fix this issue and very glad that your snippet made my day!.

HI Ptrblck,

I want to use smooth labeling with the criterion=nn.CrossEntropyLoss() with batch size of 64. The labels are random number between 0.8 to 0.9 and the outputs are from Sigmoid. The code is

    b_size=64
    label=(0.9-0.8)* torch.rand(b_size) + 0.8
    label=label.to(device).type(torch.LongTensor)

    # Forward pass real batch through D
    
    netD=netD.float()
    output = netD(real_cpu).view(-1)
    # Calculate loss on all-real batch
    output1=torch.zeros(64,64)
    for ii in range(64):
        output1[:,ii]=ii
    for ii in range(64):
        output1[ii,:]= output[ii].type(torch.LongTensor)
        
    errD_real = criterion(output1, label)

and the error is:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

By applying (torch.LongTensor) all the labels and output become 0!

You won’t be able to convert your output, which is a floating point tensor to a LongTensor without detaching it from the computation graph, since gradient calculations are only implemented for floating point values.

If you want to apply label smoothing, I would recommend to have a look at this post.

Many thanks for your help.
My case indeed is a Binary classification with 0 1and 1 label. But, I want to use smooth labeling instead of 1 having random number from 0.8 to 0.9 and instead of 0 have labels between 0.01 to 0.3.

How I can used the posted code as you suggested? I have 64 labels i think the number of class will be 64 for me because labels are completely different ? Or would u recommend any other loss function which accept float target? Or any cross entropy code which accept float target ?
if I remove the .type(torch.LongTensor) it gave me error that expected long but get float .

I’m not how, how your use case is a binary classification, if you are dealing with a number of classes of 64?

Anyway, I just tried the linked code and it’s working. Are you seeing any issues with it?

criterion = LabelSmoothingLoss(64, smoothing=0.1)

x = torch.randn(10, 64, requires_grad=True)
y = torch.randint(0, 64, (10,))

loss = criterion(x, y)

Hey,
I’m having a similar problem with this RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn error.
I’m doing some facial analysis with this network. I have a bit of experience working in pytorch, but I’m currently out of my depth. Since I’m doing this as part of a larger project, I decided to wrap the FAN in a Pytorch Lightning module which, as far as I know, shouldn’t affect how freezing params works. I mention it in case there is a problem with wrapping a network like this:

class LightningFAN(LightningModule):
  def __init__(self):
    LightningModule.__init__(self)
    self._fan = FAN(num_modules=4)
    ...

  def train_freeze(self):
    for i, param in enumerate(self._fan.parameters()):
      param.requires_grad = False
    trainable = [self._fan.l3, self._fan.m3, self._fan.top_m_3, self._fan.conv_last3, self._fan.bn_end3]
    for module in trainable:
      for param in module.parameters():
        param.requires_grad = True

  def forward(self, x):
    return self._fan.forward(x)

  ...
  def other_lightning_module_methods():
    pass
  ...

  def configure_optimizers(self):
    optimizer = RMSprop(self.parameters(), lr=self.hparams.learning_rate, weight_decay=0.0)
    return optimizer

When I call the train method on this, it errors. If I don’t freeze anything it works fine.
As a test, I started unfreezing layers until it trained correctly. I wrote a quick function to see which modules were frozen without problems

for key, val in self._fan.__dict__["_modules"].items():
  print("Key: ", key)
  try:
    print("Weight Requires Grad: ", val.weight.requires_grad)
  except Exception:
    pass
  try:
    print("Weight Grad Function: ", val.weight.grad_fn, val.weight.grad)
  except Exception:
    pass
  try:
    print("Bias Requires Grad: ", val.bias.requires_grad)
  except Exception:
    pass
  try:
    print("Bias Grad Function: ", val.bias.grad_fn, val.bias.grad)
  except Exception:
    pass
  print()

And as an output I get

Key:  conv1
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  bn1
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  conv2

Key:  conv3

Key:  conv4

Key:  m0

Key:  top_m_0

Key:  conv_last0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  bn_end0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  l0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl0
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al0
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m1

Key:  top_m_1

Key:  conv_last1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m2

Key:  top_m_2

Key:  conv_last2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m3

Key:  top_m_3

Key:  conv_last3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

The problem actually occurs when I freeze the gradient for the l0 module’s bias. Why would that be? Does this mean that none of the layers after that one are computing gradients? Also, why do no layers have gradient functions at this stage?

The .grad_fn of activations will be populated during the forward pass by default, not of the parameters.
Is your training working fine without using Lightning? If not, could you post an executable code snippet, so that we could have a look?