Inplace operation errors when implementing A2C algorithm

Hi,
I’m implementing A2C algorithm from scratch. However, I encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 1]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead.
I run in parallel 8 networks to accumulate experiences and optimize for the global network via the optimize function for one episode. The first episode ran okay but then the second failed. The error might be from the critic layer and also because actor and critic share the nonoutput layers then the bug is likely to happen.
My pytorch version is 1.10.2+cu13
Thanks,

class ActorCriticNet(nn.Module):
def init(self, scope, n_channels, n_actions):
super(ActorCriticNet, self).init()
self.scope = scope
self.net = nn.Sequential(
nn.Conv2d(n_channels, 32, 8, 4),
nn.ReLU(),
nn.Conv2d(32, 64, 4, 2),
nn.ReLU(),
nn.Conv2d(64, 32, 3, 1),
nn.ReLU(),
)
self.fc = nn.Linear(7732, 512)
self.actor = nn.Linear(512, n_actions)
self.critic = nn.Linear(512, 1)
self.optimizer = optim.Adam(self.parameters(), lr=LR)

def forward(self, x):
    x = self.net(x)
    x = x.view(-1, 7*7*32)
    x = F.relu(self.fc(x))
    policy = F.softmax(self.actor(x), dim=-1)
    value = self.critic(x)
    return policy, value

def optimize(self, workers):
    if self.scope == 'global':
        for worker in workers:
            self.optimizer.zero_grad()
            r = 0
            for reward, proba, val in worker.data[::-1]:
                r = reward + GAMMA*r
                policy_loss = -torch.log(proba) * (r - val)
                entropy_loss = -ENTROPY_WEIGHT * (proba * torch.log(proba))
                value_loss = (r - val) ** 2
                loss = policy_loss + entropy_loss + value_loss
                loss.backward(retain_graph=True)
            self.optimizer.step()

I guess the error is raised, since you are retaining the graph in the backward call:

loss.backward(retain_graph=True)

Usually, this argument is used as a workaround to try to fix another error and is creating these “inplace manipulation” errors next as the parameters were already updated inplace in the previous iteration.
Could you explain why you are using retain_graph=True in your code?

When I remove the retain_graph I encounter another RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
So I follow the hint and also the loss backward a lot of times but I’m not really sure the retain_graph is the best option.

I think we should focus on fixing the first issue properly without using retain_graph=True unless this is really your use case.

RuntimeError: Trying to backward through the graph a second time 

is raised e.g. if you are appending the current computation graph to the previous one (which was already freed during the previous backward call).
This is often the case e.g. if you are using a recurrent structure without detaching the inputs in the new iteration.
Based on your code I assume that all workers are independent or do they share some parameters, data etc.?
I would probably start checking if

        for worker in workers:
            self.optimizer.zero_grad()
            r = 0
            for reward, proba, val in worker.data[::-1]:
                r = reward + GAMMA*r
                policy_loss = -torch.log(proba) * (r - val)
                entropy_loss = -ENTROPY_WEIGHT * (proba * torch.log(proba))
                value_loss = (r - val) ** 2
                loss = policy_loss + entropy_loss + value_loss
                loss.backward(retain_graph=True)
            self.optimizer.step()

reuses some tensors (and thus also the computation graph).

I found out the problem. It’s because I do not reset the worker.data after every episode so the network keeps meeting the same data again and again. retain_graph is unnecessary anyway
Thanks for helping me solve the issue!

1 Like

Hello @DungNguyen

It appears as if I’m facing the exact same problem as you, implementing multiple A2C agents in a RL gym env. I have encountered the same errors and I suspect your solution might be helpful for my use case.

Can you explain what worker.data represents conceptually and how you went about resetting it after every episode?

Thanks in advance :slight_smile:

Hi @fahmyadan
Can you check here if one of the errors relates to what you’re seeing?
https://pytorch.org/rl/reference/generated/knowledge_base/PRO-TIPS.html

Hi @vmoens ,

I figured out that the issue i was facing was as a result of how i was training the a2c model.

The agent, in my case, is set up to train using experience from within the episodes and update the parameters based on the loss in that episode.

The problem was that i wasn’t resetting the buffer at the end of the episodes and the gradients were being updated by the same experience multiple times which led to the error.

By adding som del action and del reward statements from my buffer at the end of the training logic, i was able to train the algorithm.

Thanks for all the support @vmoens