Loss.backward(): element 0 of tensors does not require grad and does not have a grad_fn

Hey all,

I’m a beginner experimenting with resnet50 transfer learning, and I’ve been getting the runtime error “element 0 of tensors does not require grad and does not have a grad_fn” when attempting to do a training run. I’m doing this in a google colab environment, and here’s the code:

train_session_epochs = 12

with torch.enable_grad():
  for epoch in range(train_session_epochs):
    running_loss = 0.0
    cur_epochs = cur_epochs + 1

    for inputs, labels in trainloader:
      i = i+1

      inputs = inputs.to(device)
      labels = labels.to(device)

      outputs = model(inputs)
      loss = criterion(outputs, labels)

Criterion is cross entropy loss, and I’m using SGD as the optimizer. I’ve managed to identify that the issue is that loss.requires_grad for torch is false, and I’ve tried setting loss = Variable(criterion(outputs, labels), requires_grad=True), and using other methods to force requires_grad to be true, however, those are only workarounds that make the error go away and not result in loss reduction.

The strange thing is that I have a separate colab with nearly the exact same setup of training data, optimizer, model, etc., and that colab does not throw any errors and results in learning progress when I use it to train. The primary difference being that this colab file that I’m currently working on is attempting to use optimizer.load_state_dict when loading the model to train further.

Am I missing anything? I’ll be happy to provide more code if needed.

Did you freeze all trainable parameters or disabled gradient computation globally? If not, could you post a minimal and executable code snippet reproducing the issue?

I don’t think I have frozen anything, and I haven’t run torch.set_grad_enabled(False). The only possible place where I have gradient computation disabled is when I’m checking the accuracy where I run it with torch.no_grad().

Here’s my notebook: Google Colab

In your code you are freezing all parameters in:

for param in model.parameters():

However, you are replacing the .fc layer afterwards. Did you execute the cells in a different order by mistake as I cannot reproduce the issue locally?

I see. After changing requires_grad to True, it works like normal. I believe I froze the layers some time ago because of a CUDA out of memory error kept getting thrown during training runs. Thanks for the help!