I guess the issue might be raised by using
critic_loss.backward(retain_graph=True)
...
actor_loss.backward(retain_graph=True)
...
self.agents[idx - 1].actor.optimizer.step()
self.agents[idx - 1].update_network_parameters()
Using retain_graph=True
won’t release the computation graph which often yields these types of errors (e.g. if the parameters were already updated and thus the forward activations are stale as described here.
Could you explain why retain_graph=True
is used?