I am trying to run a multi agent reinforcement learning project, and getting the following error:
Traceback (most recent call last):
File "E:\USER\Desktop\TD3p\V2\main.py", line 162, in <module>
marl_agents.learn(memory, writer, steps_total)
File "E:\USER\Desktop\TD3p\V2\matd3.py", line 118, in learn
self.agents[agent_idx].actor_loss.backward()
File "E:\anaconda3\envs\pytorch\lib\site-packages\torch\_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "E:\anaconda3\envs\pytorch\lib\site-packages\torch\autograd\__init__.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
My codes are here:
for agent_idx in range(self.n_agents):
...
# critic loss calculation
self.agents[agent_idx].critic_loss = F.mse_loss(current_Q1.float(), target_Q.float()) +\
F.mse_loss(current_Q2.float(), target_Q.float())
# critic optimization
self.agents[agent_idx].critic.optimizer.zero_grad()
self.agents[agent_idx].critic_loss.backward()
self.agents[agent_idx].critic.optimizer.step()
if steps_total % self.freq == 0 and steps_total > 0:
# actor loss calculation
self.agents[agent_idx].actor_loss = -T.mean(self.agents[agent_idx].critic.Q1(states, mu))
# actor optimization
self.agents[agent_idx].actor.optimizer.zero_grad()
self.agents[agent_idx].actor_loss.backward()
self.agents[agent_idx].actor.optimizer.step()
self.agents[agent_idx].update_network_parameters()
The error happens on actors’ optimization: self.agents[agent_idx].actor_loss.backward()
Firstly, for each agent, I need to use this backward() function. So, in iterations, this function would definitely be used multiple times. However, I think for each agent, they would use this backward() function independently. So, I do not need to set ‘retain_graph=True’ because for example, the second agent did not need to access the saved variables of the first agent.
Secondly, this problem would only happen on the calculations of actor losses, although both critic losses and actor losses follow the same order of execution: calculate the loss and optimizate. After the previous agent completes the calculation and optimization, the latter agent would execute the code. Critics can call this backward() function multiple times for optimazation, but actiors can not.
I have detached the actor_loss before backpropagation. The modified code works, but the loss graphes are strange: all critics’ losses keep fluctuating, while all actors’ losses keep increasing.
The modified codes are here:
for agent_idx in range(self.n_agents):
...
# critic loss calculation
self.agents[agent_idx].critic_loss = F.mse_loss(current_Q1.float(), target_Q.float()) +\
F.mse_loss(current_Q2.float(), target_Q.float())
# critic optimization
self.agents[agent_idx].critic.optimizer.zero_grad()
self.agents[agent_idx].critic_loss.backward()
self.agents[agent_idx].critic.optimizer.step()
if steps_total % self.freq == 0 and steps_total > 0:
# actor loss calculation
self.agents[agent_idx].actor_loss = self.agents[agent_idx].critic.Q1(states, mu).detach()
self.agents[agent_idx].actor_loss.requires_grad = True
self.agents[agent_idx].actor_loss = -T.mean(self.agents[agent_idx].actor_loss)
# actor optimization
self.agents[agent_idx].actor.optimizer.zero_grad()
self.agents[agent_idx].actor_loss.backward()
self.agents[agent_idx].actor.optimizer.step()
self.agents[agent_idx].update_network_parameters()
I would appreciate it if anyone could give me some hints or help me fix this bug.