I got the same error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
with my code. My notebook here Finetune_open_clip_torch-peft-RandLoRA | Kaggle. I tried to fine-tune open-clip but not sure why I got the error although I enabled requires_grad of some parameters. It’s great if you can give me a hand and tell me what’s wrong with my code. Thanks
You could debug this issue by checking where you are detaching the forward activation in the model execution by printing the .grad_fn
at various places. This would allow you to narrow down the operation or module which is detaching the computation graph.
Thanks for your quick response. I printed result.grad_fn
and found that it might be detached from the computation graph. The author of the RandLoRA
code already replied to me and updated the code. I did the same thing to print result.grad_fn
and this time it was attached to the computation graph.
If you don’t mind, could you do me a favor? After addressing this issue, I ran the training loop, but the loss remained unchanged and the model did not update or learn properly. The Linear
layer worked correctly but MultiheadAttention
did not work. You can produce my results by running the notebook. I appreciate your help. Thanks
Could you describe why you think the MHA layer does not work?
Hi @ptrblck, When running the training loop, after each epoch, the loss and accuracy were almost unchanged. I inspected the weights of Lora, which were changed somehow, but eventually, it almost did not affect or improve the model’s performance. So I think MHA might not work. It’s weird to me. I’m not sure what’s wrong with the code.
I think you’re trying to call .backward()
on a tensor that does not track gradients, i.e., it has requires_grad=False
and does not originate from any differentiable computation.
Make sure:
- LoRA-injected layers have
requires_grad=True
- Model outputs go into a loss function (like
nn.CrossEntropyLoss
) - The tensor you call
.backward()
on is part of the computation graph