Backward() crashes if the graph is empty

I’ve spent some time chasing a bug in my code. Apparently, if the computation graph has no tensors requiring grads, autograd raises an error. More specifically:

x = torch.ones((2, 2), requires_grad=False)
y = torch.ones((2, 2), requires_grad=True)
torch.sum(x * y).backward()  # ok
torch.sum(x * x).backward()  # crashes

The last line throws a cryptic RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn, and detect_anomaly is not helpful in localising the problem.

In my case, the graph is dynamic, and on rare occasions the latter case occurs, resulting in the error. Would it be better to stop gradients without crashing, i.e. do nothing in backward() if the graph is empty?

You can simply avoid such error by only calling backward() if your output requires_grad. That will have the behavior you expect.:

loss =  torch.sum(x * x)
if x.requires_grad:
   x.backward()

In general, we choose to raise an error in this case because in most user cases, users expect backward to actually happen when they call.backward(). So having a Tensor that doesn’t require gradients is a bug in the user code in most cases.

Thank you for the explanation. If it is a common source of bugs, then raising an error makes sense.
Is it possible to make the message more helpful? It can say that backward() / autograd.grad() is called on a tensor that does not require_grad or does not have grad_fn; make sure to call it from under enable_grad / not under no_grad.
I.e. the part “element 0 of tensors” is misleading, and some guidance on why the tensors may miss grad_fn can be helpful too. Not sure that my example addresses all failure cases though.

1 Like

I think this error message is more than 3 years old :smiley: And we definitely could improve it.
Could you open an issue on github asking to update that and your ideas for what a better one would be?
That will be easier to track for us.