Gradient computation issue due to inplace operation, unsure how to debug for custom model

Zahl · January 12, 2023, 6:45pm

I am getting a gradient computation error and unsure how to fix it, ‘one of the variables needed for gradient computation has been modified by an inplace operation’. I am trying to create a model where the inputs can undergo some arbitraty tranformation specified by a function during the network execution. In this simple example the two transformations are adding or multiplying the top 2 elements of the three element input and storing result in third element. The results of this are then mixed by the model by weighting with alphas, the actual parameters of the model to give the output. Now I understand that I am doing some inplace operations with these transformations of the input but I want to learn the alpha parameters of the model and believe all the necessary variables exsist to do this and do not get overwritten anywhere. Am I making a theoretical mistake or not understanding why PyTorch is not happy with what im doing?

def addb(s):
  s[2] = s[0] + s[1]
  return s

def multb(s):
  s[2] = s[0]*s[1]
  return s

skills = [addb, multb]

class AddS(nn.Module):
    def __init__(self, num_skills, state_len) -> None:
        super().__init__()
        self.num_skills = num_skills
        self.state_len = state_len
        self.alphas = torch.ones(2)
        self.alphas = torch.nn.parameter.Parameter(self.alphas)
        self.softmax = torch.nn.Softmax(dim = 0)
    
    def forward(self, x):
      out = torch.zeros(x.size()[0], x.size()[1])
      alphas_p = self.softmax(self.alphas)

      inter = torch.zeros(x.size()[0], self.num_skills, self.state_len)

      for k in range(x.size()[0]):
        for i in range(self.num_skills):
          inter[k, i] = skills[i](x[k].clone())
        out[k] = alphas_p[0]*inter[k,0] + alphas_p[1]*inter[k,1]

      return out

input, labels = torch.tensor([[1,2,0], [2,3,0]]), torch.tensor([3., 5.])

model = AddS(2, 3)

criterion = nn.MSELoss()

outputs = model(input)

loss = criterion(outputs[:, 2], labels)

loss.backward()

KFrank · January 12, 2023, 11:17pm

Hi Sahil!

The tensor out is in the computation graph and assigning to an element
of a tensor by indexing into it is an inplace operation, hence your error.

Best.

K. Frank

Zahl · January 12, 2023, 11:38pm

Hi KFrank,

Thanks for your response, this is something that I’ve done before in other models and it doesn’t cause an issue, so I have yet to properly understand when PyTorch is ok with an inplace operations. In fact for this code example I was actually able to find a fix by replacing

out[k] = alphas_p[0]*inter[k,0] + alphas_p[1]*inter[k,1]

with

out[k] = alphas_p @ inter[k].clone()

The cloning being the key fix.

However, this is not my real problem, this was supposed to be a simplified example. I will try adapt it to be closer to my real issue.

KFrank · January 13, 2023, 1:37am

Hi Sahil!

An inplace modification of a tensor that is in the computation graph does
not necessarily cause an error – it depends on whether the that tensor’s
value is actually used in the backward pass. For some explanation, please
see this post:

In fact for this code example I was actually able to find a fix by replacing
out[k] = alphas_p[0]*inter[k,0] + alphas_p[1]*inter[k,1]
with
out[k] = alphas_p @ inter[k].clone()
The cloning being the key fix.

From your first post:

inter is also in the computation graph and is also being modified inplace
(assignment to indexed elements). Because cloning inter fixes your inplace
error, modifying inter via indexing was likely the cause of your error.

Best.

K. Frank

Zahl · January 13, 2023, 2:12am

Hi K. Frank,

Ah, thanks for that. I think it is slightly clearer now. I guess cloning meant new variables were created, none of which were modified in place and so it works out. Ok so now moving towards my actual model I have introduced a small change, now for each input the add tranformation takes place on one copy of the input, the multiply on another copy, these are stored in inter and then weighted sum to give x_k, I then take x_k and simply just repeat. I take copies of x_k run each of the transformations and weighted sum. I get the same inplace operation error. Not sure what the problem is now as I clone and dont inplace modify those things. Interestingly the error message points to an empty tensor, not sure what to make of that ‘[torch.FloatTensor []], which is output 0 of AsStridedBackward0’.

Here is the modified forward

    def forward(self, x):
      out = torch.zeros(x.size()[0], x.size()[1])
      alphas_p = self.softmax(self.alphas)

      inter = torch.zeros(x.size()[0], self.num_skills, self.state_len)
      inter2 = torch.zeros(x.size()[0], self.num_skills, self.state_len)

      for k in range(x.size()[0]):
        for i in range(self.num_skills):
          inter[k, i] = skills[i](x[k].clone())
        x_k = alphas_p @ inter[k].clone()
        for i in range(self.num_skills):
          inter2[k, i] = skills[i](x_k.clone())
        out[k] = alphas_p @ inter2[k].clone()

      return out

Many thanks,
S.

KFrank · January 13, 2023, 4:12pm

Hi Sahil!

Try wrapping your training loop in a with autograd.detect_anomaly(): block.
When the backward-pass inplace error occurs, you should get a backtrace
to the forward-pass inplace modification that is the root cause of the error.

      for k in range(x.size()[0]):
        for i in range(self.num_skills):
          inter[k, i] = skills[i](x[k].clone())
        x_k = alphas_p @ inter[k].clone()
        for i in range(self.num_skills):
          inter2[k, i] = skills[i](x_k.clone())
        out[k] = alphas_p @ inter2[k].clone()

I’m not sure where the issue is. But some comments:

You are still modifying inter, inter2, and out inplace. Note that
inter[k].clone() isn’t cloning inter, but, rather, inter[k], a
different tensor (that is a view into inter).

You never use all of inter and inter2 at once – just their k slices.
You might try something like:

    def forward(self, x):
      out_list = []
      alphas_p = self.softmax(self.alphas)

      inter_k = torch.zeros(self.num_skills, self.state_len)
      inter2_k = torch.zeros(self.num_skills, self.state_len)

      for k in range(x.size()[0]):
        for i in range(self.num_skills):
          inter_k[i] = skills[i](x[k])
        x_k = alphas_p @ inter_k
        for i in range(self.num_skills):
          inter2_[i] = skills[i](x_k)
        out.list.append  = (alphas_p @ inter2_k)
      
      out = torch.stack (out_list)
      
      return out

Best.

K. Frank

Mary84 · April 12, 2024, 12:44pm

I really appreciate your help. Thank you for going above and beyond!