Loss.backward(): element 0 of tensors does not require grad and does not have a grad_fn

MidnightChaos · December 15, 2023, 4:17pm

Hey all,

I’m a beginner experimenting with resnet50 transfer learning, and I’ve been getting the runtime error “element 0 of tensors does not require grad and does not have a grad_fn” when attempting to do a training run. I’m doing this in a google colab environment, and here’s the code:

train_session_epochs = 12

with torch.enable_grad():
  for epoch in range(train_session_epochs):
    running_loss = 0.0
    cur_epochs = cur_epochs + 1

    for inputs, labels in trainloader:
      i = i+1

      inputs = inputs.to(device)
      labels = labels.to(device)
      
      optimizer.zero_grad()

      outputs = model(inputs)
      loss = criterion(outputs, labels)
      loss.backward()
      optimizer.step()

Criterion is cross entropy loss, and I’m using SGD as the optimizer. I’ve managed to identify that the issue is that loss.requires_grad for torch is false, and I’ve tried setting loss = Variable(criterion(outputs, labels), requires_grad=True), and using other methods to force requires_grad to be true, however, those are only workarounds that make the error go away and not result in loss reduction.

The strange thing is that I have a separate colab with nearly the exact same setup of training data, optimizer, model, etc., and that colab does not throw any errors and results in learning progress when I use it to train. The primary difference being that this colab file that I’m currently working on is attempting to use optimizer.load_state_dict when loading the model to train further.

Am I missing anything? I’ll be happy to provide more code if needed.

ptrblck · December 15, 2023, 4:27pm

Did you freeze all trainable parameters or disabled gradient computation globally? If not, could you post a minimal and executable code snippet reproducing the issue?

MidnightChaos · December 15, 2023, 4:39pm

I don’t think I have frozen anything, and I haven’t run torch.set_grad_enabled(False). The only possible place where I have gradient computation disabled is when I’m checking the accuracy where I run it with torch.no_grad().

Here’s my notebook: Google Colab

ptrblck · December 15, 2023, 5:02pm

In your code you are freezing all parameters in:

for param in model.parameters():
  param.requires_grad=False

However, you are replacing the .fc layer afterwards. Did you execute the cells in a different order by mistake as I cannot reproduce the issue locally?

MidnightChaos · December 16, 2023, 1:03am

I see. After changing requires_grad to True, it works like normal. I believe I froze the layers some time ago because of a CUDA out of memory error kept getting thrown during training runs. Thanks for the help!

vietvo · June 6, 2025, 1:16pm

I got the same error with my code here Finetune_open_clip_torch-peft-RandLoRA | Kaggle. I tried to fine-tune open-clip but not sure why I got the error although I enabled requires_grad of some parameters. It’s great if you can give me a hand and tell me what’s wrong with my code. Thanks

ptrblck · June 15, 2025, 3:21am

Check if intermediate activations show a valid .grad_fn to isolate if and where you might have detached the computation graph.