Hi,

I’m implementing A2C algorithm from scratch. However, I encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 1]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead.

I run in parallel 8 networks to accumulate experiences and optimize for the global network via the optimize function for one episode. The first episode ran okay but then the second failed. The error might be from the critic layer and also because actor and critic share the nonoutput layers then the bug is likely to happen.

My pytorch version is 1.10.2+cu13

Thanks,

class ActorCriticNet(nn.Module):

def **init**(self, scope, n_channels, n_actions):

super(ActorCriticNet, self).**init**()

self.scope = scope

self.net = nn.Sequential(

nn.Conv2d(n_channels, 32, 8, 4),

nn.ReLU(),

nn.Conv2d(32, 64, 4, 2),

nn.ReLU(),

nn.Conv2d(64, 32, 3, 1),

nn.ReLU(),

)

self.fc = nn.Linear(7*7*32, 512)

self.actor = nn.Linear(512, n_actions)

self.critic = nn.Linear(512, 1)

self.optimizer = optim.Adam(self.parameters(), lr=LR)

```
def forward(self, x):
x = self.net(x)
x = x.view(-1, 7*7*32)
x = F.relu(self.fc(x))
policy = F.softmax(self.actor(x), dim=-1)
value = self.critic(x)
return policy, value
def optimize(self, workers):
if self.scope == 'global':
for worker in workers:
self.optimizer.zero_grad()
r = 0
for reward, proba, val in worker.data[::-1]:
r = reward + GAMMA*r
policy_loss = -torch.log(proba) * (r - val)
entropy_loss = -ENTROPY_WEIGHT * (proba * torch.log(proba))
value_loss = (r - val) ** 2
loss = policy_loss + entropy_loss + value_loss
loss.backward(retain_graph=True)
self.optimizer.step()
```