Why my linear layer is considered as an inplace operation?

My model architecture is defined as the following:

class NN(torch.nn.Module):
    def __init__(self):
        super(NN, self).__init__()
        self.linear1 = nn.Linear(3, 9)
        self.act1 = nn.ReLU(inplace=False)
        self.linear2 = nn.Linear(9, 3)
        
    def forward(self, x):
        x = self.linear1(x)
        x = self.act1(x)
        x = self.linear2(x)
        return x

And I am trying to train a model by using codes similar to the following:

    model = NN()
    model = model.double()
    loss_fn = nn.MSELoss()
    optimizer=torch.optim.Adam(model.parameters(),lr=1e-3)

    epoches = 100
    batch_size = 5
    iterations = total_num / batch_size

    for epoch in range(1,epoches+1):
      train_a, ans = dataset_init(A, B, ratio)
      preditions = torch.tensor(np.zeros((total_num, 2, 3, 1, 3)))
      
      for iteration in range(0,iterations): 
          loss_train = torch.empty(batch_size)
          for i in range(0+batch_size*iteration, batch_size*(iteration+1)):
              prediction[i] = model(torch.unsqueeze(train_a[i].to(torch.float64),0))
              loss_train[i-batch_size*iteration] = loss_fn(prediction[i].clone(),ans[i])
          loss_train_avg = torch.mean(loss_train.clone())
          optimizer.zero_grad()
          loss_train_avg.backward(retain_graph=True)
          optimizer.step()

After running the code, I got:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [9, 3]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

However, I did not find inplace operations in my codes. And the ‘torch.DoubleTensor [9, 3]’ in the error message looks like something related to my 2nd Linear layer, so I deleted it to check, then the codes run smoothly without the error. I have also tried to use ‘x = self.linear2(x.clone())’, but the codes also report the same error message above.

Now I have no idea how to modify it, and I dont know why my linear layer is considered as an inplace operation. Can anybody help me?

The inplace comes from doing optimzer.step(). Why do you need retain_graph=True, removing that should resolve this error.

Hi, thanks for your reply.

I add ‘retain_graph=True’ to solve another error. If I remove it, my codes will report the following error message :frowning:

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Hi Momo!

As @soulitzer noted, using retain_graph = True can often lead to
inplace-modification errors. (Based just on the code you posted, I don’t
see why you would need retain_graph = True, so you should try to
get rid of it.)

However, I don’t think that this is your main issue.

Using indices to assign into a tensor is an inplace operation, and the
tensors prediction and loss_train are in your computation graph,
so your for i in range(): loop is making such inplace modifications,
hence your error.

Without digging into perhaps better ways to organize your code, you could
do something like:

        loss_train_list = []
        for i in range(batch_size):
            prediction_i = model (train_a[i].to(torch.float64))
            loss_train_i = loss_fn(prediction_i.clone(),ans[i])
            loss_train_list.append (loss_train_i)
            prediction[i] = prediction_i   # if needed
            loss_train[i] = loss_train_i   # if needed
        loss_train_stack = torch.stack (loss_train_list)
        loss_train_avg = torch.mean(loss_train_stack)

This way you don’t index-assign into your loss_train tensor; rather, you
collect the loss_train slices computed in your for-loop into a list, and
then pass the loss_train_stack version of your loss_train tensor to
torch.mean().

(If for some reason you still need to index-assign into prediction and / or
loss_train, you can safely do so because they are no longer in the
computation graph that leads to loss_train_avg on which you call
.backward().)

Best.

K. Frank

1 Like

Hi, Frank

Thanks for your reply!
My problem was solved after modifying accroding to your suggestion! :smiling_face: