RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [181, 128]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detect

Hi, I’m new to pytorch. I’ve seen a few posts about this error and got to a conclusion that setting a variable inplace would cause this error. Now I’m unable to understand where I’m doing an inplace operation. I’m attaching my code (this is a simple character level rnn model). Please help me solve this issue. It might be tempting to say that I’m summing the loss at each character that might be an issue. But I removed it and trained it but still the error persists. Finally when I set the retain_graph to True, Version error pops up saying tensors are not in the same version. The error points to the first layer of my rnn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        
        self.hidden_size = hidden_size
        self.input_to_hidden = nn.Linear(input_size + hidden_size, hidden_size)
        self.input_to_output = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)
    
    def forward(self, input_tensor, hidden_tensor):
        x_a_t_minus_1 = torch.cat((input_tensor, hidden_tensor), dim=1)
        a_t = self.input_to_hidden(x_a_t_minus_1)
        y_t = self.input_to_output(a_t)
        return a_t, self.softmax(y_t)
    
    def _init_hidden(self):
        return torch.zeros(1, self.hidden_size)

epochs = 1000
total_words = len(words)
activations = rnn._init_hidden()
for epoch in range(epochs):
    idx = epoch%total_words
    word = words[idx]
    loss_at_char = torch.tensor(0)
    for char in word:
        char_repr = word_to_one_hot(char)
        target_repr = torch.tensor([vocab_dict[char]])
        activations, pred_prob = rnn.forward(char_repr, activations)
        loss_at_char = loss_at_char + criterion(pred_prob, target_repr)
    optimizer.zero_grad()
    loss_at_char.backward(retain_graph=True)
    optimizer.step()

I am also encountering this issue when building up a GAN. I just updated my PyTorch installation today (was having some problems with my code causing a full server crash, so updated my CUDA installation, drivers, and PyTorch environment and now get this error instead), and have tried to work my code down to the most simplified example that reproduces the error.

faulttest.py:

import torch
from torch import nn

config = {
    'batch_size': 8,
    'inp_x_size': 4,
    'latent_size': 2, 
}

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.linear_stack = nn.Sequential(
            nn.Linear(config['inp_x_size'], config['latent_size']),
            nn.Linear(config['latent_size'], config['inp_x_size']),
        )
    def forward(self, x):
        return self.linear_stack(x)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.linear_stack = nn.Sequential(
            nn.Linear(config['inp_x_size'], 1)
        )
    def forward(self, x):
        return self.linear_stack(x)

def train( 
    generator: Generator, discriminator: Discriminator, 
    g_loss_fn: nn.Module, d_loss_fn: nn.Module, 
    g_optimizer: nn.Module, d_optimizer: nn.Module,
):
    for batch in range(1):
        #Load a batch
        x = torch.rand((config['batch_size'], config['inp_x_size']), dtype=torch.float32)
        g_forward = generator(x)
        g_loss = g_loss_fn(g_forward, x)
    
        #Backprop and optimization for generator
        g_optimizer.zero_grad()
        g_loss.backward(retain_graph=True)
        g_optimizer.step()
    
        #Calculate discriminator loss
        y = torch.rand((config['batch_size'], 1), dtype=torch.float32)
        d_score = discriminator(g_forward) #FAIL
        #d_score = discriminator(g_forward.detach()) #WORKS
        d_loss = d_loss_fn(d_score, y)
    
        #Backprop and optimization for discriminator
        d_optimizer.zero_grad()
        d_loss.backward()
        d_optimizer.step()

def main():
    torch.autograd.set_detect_anomaly(True)
    #Construct networks
    generator = Generator()
    discriminator = Discriminator()

    #Define loss functions
    g_loss_fn = nn.MSELoss()
    d_loss_fn = nn.MSELoss()

    #Define optimizers
    g_optimizer = torch.optim.Adam(generator.parameters(), lr=1E-4)
    d_optimizer = torch.optim.Adam(discriminator.parameters(), lr=1E-4)
    
    #Perform training cycles
    for epoch in range(1):
        train(generator, discriminator, g_loss_fn, d_loss_fn, g_optimizer, d_optimizer)

if __name__ == "__main__":
    main()

Console log:

(pytorch) jon@io:/mnt/ssd-storage/GAN_Playground$ python3 faulttest.py 
Torch: 2.1.2
/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/autograd/__init__.py:251: UserWarning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:
  File "/mnt/ssd-storage/GAN_Playground/faulttest.py", line 76, in <module>
    main()
  File "/mnt/ssd-storage/GAN_Playground/faulttest.py", line 73, in main
    train(generator, discriminator, g_loss_fn, d_loss_fn, g_optimizer, d_optimizer)
  File "/mnt/ssd-storage/GAN_Playground/faulttest.py", line 37, in train
    g_forward = generator(x)
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/ssd-storage/GAN_Playground/faulttest.py", line 18, in forward
    return self.linear_stack(x)
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/container.py", line 215, in forward
    input = module(input)
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
 (Triggered internally at /opt/conda/conda-bld/pytorch_1702400430266/work/torch/csrc/autograd/python_anomaly_mode.cpp:114.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "/mnt/ssd-storage/GAN_Playground/faulttest.py", line 76, in <module>
    main()
  File "/mnt/ssd-storage/GAN_Playground/faulttest.py", line 73, in main
    train(generator, discriminator, g_loss_fn, d_loss_fn, g_optimizer, d_optimizer)
  File "/mnt/ssd-storage/GAN_Playground/faulttest.py", line 53, in train
    d_loss.backward()
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/home/jon/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 4]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Notably, detaching the output of the generator before passing it into the discriminator “fixes” the error in my case.

You are running into this issue since you are trying to use stale forward activations, which would result in a wrong gradient computation.

Fixed my issue. Weirdly, the original code was working on an older version of PyTorch and my model was training (before I introduced something that Adam was fine with, but SGD would server crash on), but now with this change my generator no longer improves (granted, I think I am doing a really weird method of training). Oh well, c’est la vie, back to method and hyperparameter tinkering. Thank you!