Inplace operation errors when implementing A2C algorithm

Hi,
I’m implementing A2C algorithm from scratch. However, I encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 1]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead.
I run in parallel 8 networks to accumulate experiences and optimize for the global network via the optimize function for one episode. The first episode ran okay but then the second failed. The error might be from the critic layer and also because actor and critic share the nonoutput layers then the bug is likely to happen.
My pytorch version is 1.10.2+cu13
Thanks,

class ActorCriticNet(nn.Module):
def init(self, scope, n_channels, n_actions):
super(ActorCriticNet, self).init()
self.scope = scope
self.net = nn.Sequential(
nn.Conv2d(n_channels, 32, 8, 4),
nn.ReLU(),
nn.Conv2d(32, 64, 4, 2),
nn.ReLU(),
nn.Conv2d(64, 32, 3, 1),
nn.ReLU(),
)
self.fc = nn.Linear(7732, 512)
self.actor = nn.Linear(512, n_actions)
self.critic = nn.Linear(512, 1)
self.optimizer = optim.Adam(self.parameters(), lr=LR)

def forward(self, x):
    x = self.net(x)
    x = x.view(-1, 7*7*32)
    x = F.relu(self.fc(x))
    policy = F.softmax(self.actor(x), dim=-1)
    value = self.critic(x)
    return policy, value

def optimize(self, workers):
    if self.scope == 'global':
        for worker in workers:
            self.optimizer.zero_grad()
            r = 0
            for reward, proba, val in worker.data[::-1]:
                r = reward + GAMMA*r
                policy_loss = -torch.log(proba) * (r - val)
                entropy_loss = -ENTROPY_WEIGHT * (proba * torch.log(proba))
                value_loss = (r - val) ** 2
                loss = policy_loss + entropy_loss + value_loss
                loss.backward(retain_graph=True)
            self.optimizer.step()

I guess the error is raised, since you are retaining the graph in the backward call:

loss.backward(retain_graph=True)

Usually, this argument is used as a workaround to try to fix another error and is creating these “inplace manipulation” errors next as the parameters were already updated inplace in the previous iteration.
Could you explain why you are using retain_graph=True in your code?

When I remove the retain_graph I encounter another RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
So I follow the hint and also the loss backward a lot of times but I’m not really sure the retain_graph is the best option.

I think we should focus on fixing the first issue properly without using retain_graph=True unless this is really your use case.

RuntimeError: Trying to backward through the graph a second time 

is raised e.g. if you are appending the current computation graph to the previous one (which was already freed during the previous backward call).
This is often the case e.g. if you are using a recurrent structure without detaching the inputs in the new iteration.
Based on your code I assume that all workers are independent or do they share some parameters, data etc.?
I would probably start checking if

        for worker in workers:
            self.optimizer.zero_grad()
            r = 0
            for reward, proba, val in worker.data[::-1]:
                r = reward + GAMMA*r
                policy_loss = -torch.log(proba) * (r - val)
                entropy_loss = -ENTROPY_WEIGHT * (proba * torch.log(proba))
                value_loss = (r - val) ** 2
                loss = policy_loss + entropy_loss + value_loss
                loss.backward(retain_graph=True)
            self.optimizer.step()

reuses some tensors (and thus also the computation graph).

I found out the problem. It’s because I do not reset the worker.data after every episode so the network keeps meeting the same data again and again. retain_graph is unnecessary anyway
Thanks for helping me solve the issue!