RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

@sonal_garg and other people that may be tempted to use loss.requires_grad=True or some similar fix, note that if you are even able to set the requires_grad of your loss, something is very wrong.

You can only set requires_grad of leaf nodes of the computation graph, that is, tensors that do not further propagate the gradient (see this explanation). If your loss does not propagate the gradient to the rest of your model, you are not training the model, and the loss is useless (even if you force it to require a gradient).

Chances are, some layer is set as requires_grad=False somewhere in the code (probably all of them or the last layer), or the computation graph is detached. Possibly the backward call is wrapped in a with torch.no_grad().

19 Likes