Can't identify the inplace operation

I use the following code snippet to calculate attention weights and get the new hidden state input for my RNN.

 association = torch.mv(hidden_states_other[l], hidden_states[i])
 probs = torch.nn.functional.softmax(association)

 attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
 hidden_states_weights[i] = attention_val

Running this, gives me the error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I don’t see any inplace operation. Can someone explain why this is happening?

 hidden_states : N x R tensor
 hidden_states_other : N x R tensor
 l : M, tensor
 hidden_states_weights : N x R tensor
1 Like

This is an in-place assignment. Most likely the best solution will be to accumulate the contents of hidden_states_weights as a list, then use torch.cat or torch.stack after the for loop to combine it together into a Tensor.

I tried that as well, but it results in the same error:

 list_of_attention_val = []
 for i in range(seq):
   association = torch.mv(hidden_states_other[l], hidden_states[i])
   probs = torch.nn.functional.softmax(association)

   attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
   list_of_attention_val.append(attention_val)
 hidden_states_weights = torch.stack(list_of_attention_val)

Huh, there don’t appear to be any remaining in-place ops in the code snippet you pasted. Maybe it’s somewhere else?

When I comment this line, I don’t get the error anymore.

Also, in the above snippet, if I replace the assignment with this

hidden_states_weights[i] = hidden_states_weights[i] + attention_val

it shouldn’t be an in-place operation anymore, the way I understand it. But, I still get the same error.

It couldn’t have been that, since if I replace the above snippet with this

# association = torch.mv(hidden_states_other[l], hidden_states[i])
# probs = torch.nn.functional.softmax(association)
probs = Variable(torch.ones(torch.numel(l)).cuda())
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val

It works without any error.

Even this works perfectly:

hidden_states_weights[i] = torch.sum(hidden_states_other[l], 0)

I can’t understand why this wouldn’t be an in-place operation, but the other assignments in the above posts are.

Is there a good way to debug which operation is deemed as in-place and thereby, resulting in the error?

I’m not sure exactly what’s happening here. Maybe @apaszke can help?

New update: If I clone this tensor, then I don’t get that error anymore

association = torch.mv(hidden_states_other[l], hidden_states[i].clone())

I have no idea how this resolved the error
I know clone() detaches the gradient, hence I can’t use this. I am not sure if this was helpful.

Clone does not detach the gradient. Only .detach does that.

1 Like

So, why does clone resolve the error? I am still unsure how that operation makes it not an in-place operation.

Because clone creates a new version of the tensor with its own memory.
So inplace operations in the cloned tensor are not inplace in the original one. Backpropagation-wise, the clone does not change anything as the gradients will be passed to the original tensor unchanged.

2 Likes

Similar issue with me. Here is my snippet:

        self.scorer = nn.Linear(hidden_dim, 1)
        mat = []
        for i in range(seq2_len):
            temp = []
            for j in range(seq1_len):
                seq2_state = sequence2_[:,i,:]
                seq1_state = sequence1_[:,j,:]
                diff = self.scorer(T.abs(seq1_state - seq2_state)).squeeze()
                temp.append(diff)
            temp = T.stack(temp)
            mat.append(temp)
            
        mat = T.stack(mat)
        mat = T.transpose(T.transpose(mat, 2, 0), 2, 1)

Can you please help me?

1 Like

@jekbradbury, @apaszke, @albanD can you please help me with this?