PyTorch RuntimeError: Trying to backward through the graph a second time

Hi Bernardo!

I assume that the outputs of your Value functions depend somehow on
your Policy Network and that you need to take these dependencies into
account when you optimize your Policy Network. (If not, you could get
rid of the retain_graph =True and .detach() the outputs of your Value
functions before you use them in anything that has to do with your Policy
Network. Doing so would solve the “backward a second time” error you
reported in your first post.)

The problem is (if you leave the dependency of your Policy Network on
your Value functions in place) that when you call .backward() on your
Policy-Network loss, you will also backpropagate through those Value
functions. But you have already done something along the lines of calling
optimizer.step() on your Value functions and doing so modifies the
Parameters of your Value functions inplace, leading to the error you see.

Assuming that this is what is going on, you should modify the forward
passes of your Value functions so that their Parameters (that are being
modified inplace) are not, themselves, needed in Policy Network’s
backward pass. You can probably achieve this with something like:

# in ValueFunction
linA = torch.nn.Linear (in_features, out_features)
...
# instead of
# y = linA (x)
# use
y = torch.nn.functional.linear (x, linA.weight.clone(), linA.bias.clone())

The point is that when backpropagating through ValueFunction, the
clone()s of linA’s Parameters are used and those clone()s haven’t
been modified by calling optimizer.step().

For further insight into what is going on in this situation I described or for
some suggestions on how to debug your issue if it’s something else that
is going on, please see this post:

Best.

K. Frank