Fine-tuning open-clip with lora, Loss.backward(), RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

vietvo · June 6, 2025, 1:19pm

I got the same error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn with my code. My notebook here Finetune_open_clip_torch-peft-RandLoRA | Kaggle. I tried to fine-tune open-clip but not sure why I got the error although I enabled requires_grad of some parameters. It’s great if you can give me a hand and tell me what’s wrong with my code. Thanks

ptrblck · June 6, 2025, 2:53pm

You could debug this issue by checking where you are detaching the forward activation in the model execution by printing the .grad_fn at various places. This would allow you to narrow down the operation or module which is detaching the computation graph.

vietvo · June 7, 2025, 2:29am

Thanks for your quick response. I printed result.grad_fn and found that it might be detached from the computation graph. The author of the RandLoRA code already replied to me and updated the code. I did the same thing to print result.grad_fn and this time it was attached to the computation graph.

vietvo · June 7, 2025, 2:40am

If you don’t mind, could you do me a favor? After addressing this issue, I ran the training loop, but the loss remained unchanged and the model did not update or learn properly. The Linear layer worked correctly but MultiheadAttention did not work. You can produce my results by running the notebook. I appreciate your help. Thanks

ptrblck · June 7, 2025, 1:14pm

Could you describe why you think the MHA layer does not work?

vietvo · June 8, 2025, 7:58am

Hi @ptrblck, When running the training loop, after each epoch, the loss and accuracy were almost unchanged. I inspected the weights of Lora, which were changed somehow, but eventually, it almost did not affect or improve the model’s performance. So I think MHA might not work. It’s weird to me. I’m not sure what’s wrong with the code.

Akilsurya_S · June 10, 2025, 3:34pm

I think you’re trying to call .backward() on a tensor that does not track gradients, i.e., it has requires_grad=False and does not originate from any differentiable computation.

Make sure:

LoRA-injected layers have requires_grad=True
Model outputs go into a loss function (like nn.CrossEntropyLoss)
The tensor you call .backward() on is part of the computation graph