Modified PPO Example: loss_value.backward(retain_graph=True)?

Hi all,

I’ve modified the PPO tutorial to use a custom environment. However, in the training step, when I call “loss_value.backward()” it eventually throws “RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed)…”

If I naively change it to “loss_value.backward(retain_graph=True)” it works, but I’m not sure why the error was thrown in the original case. Any information on why this is occurring, and what the implications are for using “loss_value.backward(retain_graph=True)” would be appreciated.

I stepped through the output of my custom env and it seems like the obs, reward, etc. retained their graphs, as far as I can tell.

Thanks in advance!


Please post your code and/or provide a link to the mentionned tutorial to help solve your issue.