To compute the gradient of loss with autograd.grad(), do I need to say loss.backward() first? I need these gradients to help me update the model, but I’m doing two backward passes, so autograd is needed. I’m wondering if the additional backward pass using autograd also needs .backward().

The two functions do similar things: autograd.grad() computes and returns
gradients with respect to its inputs argument, while loss.backward()
computes gradients with respect to the leaf variables of the graph of which loss is the root and populates the .grad properties of those leaf variables
with the computed gradients.

Both require that some forward pass has been performed so that a
computation graph is present and neither requires that some form of .backward() or grad() be called in advance.

Think through carefully the logic of why you are doing two backward passes.
There are legitimate use cases for doing this but they usually requires some
care to get right.

By default, both .backward() and autograd.grad() free the computation
graph, so something like:

loss.backward()
autograd.grad (loss, inputs)

will fail (because the graph will be freed), while

will almost certainly fail with an inplace-modification error. This is because opt.step() performs inplace modifications on the parameters it is optimizing.

If you’re trying to do something like this, first make sure that it is necessary
and that you understand why. Then figure out the modifications to your
forward pass needed to avoid inplace-modification errors.

I’m doing the meta learning algorithm where a second backward pass is required. The pseudocode looks like: (https://arxiv.org/pdf/1703.03400.pdf on top of page 3)

for task in many_tasks:
part_1_data, part_2_data = task
adapted_parameter = adapted_parameter - coefficient * gradient of loss on part_1_data
[1] final_parameter = original_parameter - another_coefficient * gradient loss (of adapted_parameter) on query item (with respect to original parameter).

In this case, calculating the gradients in [1] needs a second pass. Following your explanation, I found that using autograd.grad(inputs=loss on part_1_data, create_graph = True) works well!