RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [768, 256]] is at version 3; expected version 2 instead.

Hi,

I am using the PyTorch version 1.9 in google’s colab. I tried using with torch.autograd.set_detect_anomly(True) to find more information but the stack trace doesn’t change and it is not showing any other additional info.

def _a2c_update(self, value, batch_idx):
        returns, advantages = self._discount_rewards(value, self.transitions[batch_idx])
        for transition, _return, advantage in zip(self.transitions[batch_idx], returns, advantages):
            reward, index, output, value, done = transition
            if done:
                continue

            advantage = advantage.detach()
            probs = F.softmax(output, dim=-1)
            log_probs = torch.log(probs)
            log_action_prob = log_probs[index]
            policy_loss = -log_action_prob * advantage
            value_loss = (.5 * (value - _return)**2)
            entropy = (-log_probs * probs).mean()

            # add up the loss over time
            self.model_loss += policy_loss + 0.5 * value_loss - 0.1 * entropy

            self.statistics.stats_episode_append(
                reward=reward,
                policy=policy_loss.item(),
                value=value_loss.item(),
                entropy=entropy.item(),
                confidence=torch.mean(torch.exp(log_action_prob)).item()
            )
        self.model_updates += 1

        self.transitions[batch_idx] = []

        if self.model_loss == 0 or self.model_updates % self.batch_size != 0:
            return

        # Only if all of the agents in the batch have performed their update the backpropagation is invoked to reduce
        # computational complexity

        self.statistics.stats_episode_append(loss=self.model_loss.item())
        self.optimizer.zero_grad()
        self.model_loss.backward(retain_graph=True)
        nn.utils.clip_grad_norm_(self.model.parameters(), self.config['training']['optimizer']['clip_grad_norm'])
        self.optimizer.step()

        self.model_loss = 0.

@albanD , Please let me know what steps should I take in order to resolve the issue.

I tried using self.model_loss.clone() right before the self.model_loss.backward(), but that didn’t work.

You should have a warning before the stack trace.
But I think that colab is dropping some of these warnings :confused:

You can check which Tensor has the size from the error and you’re modifying it inplace.

Thanks, @albanD for your quick response.
I tried downgrading the PyTorch version from 1.9 to PyTorch 1.4 and the issue gets resolved.
Why is that happening? Does that mean there are issues in PyTorch 1.4 also? Will the results won’t be correct?
I am using this code from a GitHub repository of a published paper in 2019.

PyTorch version from 1.9 to PyTorch 1.4 and the issue gets resolved.

In that case I would bet on the optimizer doing the inplace update on a weight and you’re trying to backward again after. But the weight was modified inplace by the optimizer hence the error.
Versions before 1.8 (IIRC) had buggy optimizers for which the inplace was not registered properly and it was silently wrong.

You should make sure that you never do forward then optimizer step then backward. You need to redo the forward after an optimizer step before doing a backward.