RuntimeError : one of the variables needed for gradient computation has been modified by an inplace operation

kyeongjun1007 · October 19, 2022, 10:38am

Hi.

I’m training self-supervision RNN with time-series data and sliding window.
but an inplace operation error occurs when training model.

the following is my code.

import torch
import numpy as np
from torch.utils.data import DataLoader
from torch import nn

data = torch.randn(100,9)

class TimeseriesDataset(torch.utils.data.Dataset):   
    def __init__(self, X, y, seq_len=1):
        self.X = X
        self.y = y
        self.seq_len = seq_len

    def __len__(self):
        return self.X.__len__() - (self.seq_len-1)

    def __getitem__(self, index):
        return (self.X[index:index+self.seq_len], self.y[index:index+self.seq_len])
    
data = torch.tensor(np.array(data), dtype = torch.float32)

train_dataset = TimeseriesDataset(data[:-1], data[1:], seq_len=5)
train_loader = DataLoader(train_dataset, batch_size = 1, shuffle = False)

input_size = 9
hidden_size = 9
num_layers = 3

rnn = nn.RNN(input_size= input_size, hidden_size = hidden_size, num_layers = num_layers, batch_first = True)

num_epochs = 5
learning_rate = 0.01

optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)

criterion = nn.CrossEntropyLoss()

hidden = None

for epoch in range(num_epochs) :
  for i, d in enumerate(train_loader) :        
      out, hidden = rnn.forward(d[0], hidden)

      loss = criterion(d[1], out)
      
      optimizer.zero_grad()
      loss.backward(retain_graph=True)
      optimizer.step()
      
      print(loss.item())

result in the following error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [9, 9]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

ptrblck · October 19, 2022, 5:29pm

This error is often caused if retain_graph=True is used while it might not be needed. Could you explain why you are using this argument and if you’ll get another error otherwise?

kyeongjun1007 · October 21, 2022, 9:41am

If I don’t use retain_graph =True, following error occurs.

Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

I found cause of this error! the error comes from input of rnn model.
I have to do >> out, hidden = rnn(x, hidden_init) or out, hidden = rnn(x) << these.
but >> out, hidden = rnn(x, hidden) << this.

because output of model is used to input (variable ‘hidden’ is replaced over iteration), inplace operation is occured…

MH_Rosenberg · November 8, 2022, 2:01am

Lol I don’t actually understand the details of your solution but thank you!!! Your final statement allowed me to fix my code. I fixed my issue by cloning the hidden layer before I passed it to the model. My hypothesis is that the optimizer doesn’t want any variables to take on different values per iteration of the training loop. In my case, I needed to call the forward pass of the model several times per training loop iteration because (within each training loop iteration) I iterate over all of the time points (columns) of a single input (matrix) to generate a single output (scalar) that is ultimately compared to a true label (scalar).

I would be interested if you or others have any additional insights here. I suspect that I’m not doing this the way pytorch intends the package to be used.

kyeongjun1007 · November 8, 2022, 11:08pm

In pytorch nn.RNN official documentation, written as follows.
out, hidden_n = rnn(x, hidden_0)
when your input data length is N, out = (out1, out2, … , outN)
and outN is determined by W_out * (tahn(hidden_(N-1) * W_hidden + x_N * W_input + bias))

hidden_0 is initialization of the hidden state (0th hidden) and hidden_n is the nth hidden state (out before multiplication with output weight(W_out). tahn(hidden_(N-1) * W_hidden + x_N * W_input + bias))

if you solve your ploblem using following code,
out, hidden = model(x, hidden.clone())
it could work because you can avoid inplace operation by making hidden into leaf tensor (hidden.clone())
but it means your model initialize the hidden state to nth hidden state value of prior iteration i think…

check this page!
https://pytorch.org/docs/stable/generated/torch.nn.RNN.html