Do I need loss.backward before using autograd.grad() on loss?

Hi Community,

To compute the gradient of loss with autograd.grad(), do I need to say loss.backward() first? I need these gradients to help me update the model, but I’m doing two backward passes, so autograd is needed. I’m wondering if the additional backward pass using autograd also needs .backward().

Many Thanks!

Hi Gears!

No., no need to call loss.backward() first.

The two functions do similar things: autograd.grad() computes and returns
gradients with respect to its inputs argument, while loss.backward()
computes gradients with respect to the leaf variables of the graph of which
loss is the root and populates the .grad properties of those leaf variables
with the computed gradients.

Both require that some forward pass has been performed so that a
computation graph is present and neither requires that some form of
.backward() or grad() be called in advance.

Think through carefully the logic of why you are doing two backward passes.
There are legitimate use cases for doing this but they usually requires some
care to get right.

By default, both .backward() and autograd.grad() free the computation
graph, so something like:

loss.backward()
autograd.grad (loss, inputs)

will fail (because the graph will be freed), while

loss.backward (retain_graph = True)
autograd.grad (loss, inputs)

will work.

Also note that unless you arrange your forward pass carefully to avoid the
problem, something like:

loss.backward (retain_graph = True)
opt.step()
autograd.grad (loss, inputs)

will almost certainly fail with an inplace-modification error. This is because
opt.step() performs inplace modifications on the parameters it is optimizing.

If you’re trying to do something like this, first make sure that it is necessary
and that you understand why. Then figure out the modifications to your
forward pass needed to avoid inplace-modification errors.

Best.

K. Frank

1 Like

Thanks so much for your detailed reply K. Frank!

I’m doing the meta learning algorithm where a second backward pass is required. The pseudocode looks like: (https://arxiv.org/pdf/1703.03400.pdf on top of page 3)

for task in many_tasks:
part_1_data, part_2_data = task
adapted_parameter = adapted_parameter - coefficient * gradient of loss on part_1_data
[1] final_parameter = original_parameter - another_coefficient * gradient loss (of adapted_parameter) on query item (with respect to original parameter).

In this case, calculating the gradients in [1] needs a second pass. Following your explanation, I found that using autograd.grad(inputs=loss on part_1_data, create_graph = True) works well!