Can't identify the inplace operation


#1

I use the following code snippet to calculate attention weights and get the new hidden state input for my RNN.

 association = torch.mv(hidden_states_other[l], hidden_states[i])
 probs = torch.nn.functional.softmax(association)

 attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
 hidden_states_weights[i] = attention_val

Running this, gives me the error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I don’t see any inplace operation. Can someone explain why this is happening?

 hidden_states : N x R tensor
 hidden_states_other : N x R tensor
 l : M, tensor
 hidden_states_weights : N x R tensor

(James Bradbury) #2

This is an in-place assignment. Most likely the best solution will be to accumulate the contents of hidden_states_weights as a list, then use torch.cat or torch.stack after the for loop to combine it together into a Tensor.


#3

I tried that as well, but it results in the same error:

 list_of_attention_val = []
 for i in range(seq):
   association = torch.mv(hidden_states_other[l], hidden_states[i])
   probs = torch.nn.functional.softmax(association)

   attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
   list_of_attention_val.append(attention_val)
 hidden_states_weights = torch.stack(list_of_attention_val)

(James Bradbury) #4

Huh, there don’t appear to be any remaining in-place ops in the code snippet you pasted. Maybe it’s somewhere else?


#5

When I comment this line, I don’t get the error anymore.


#6

Also, in the above snippet, if I replace the assignment with this

hidden_states_weights[i] = hidden_states_weights[i] + attention_val

it shouldn’t be an in-place operation anymore, the way I understand it. But, I still get the same error.


#7

It couldn’t have been that, since if I replace the above snippet with this

# association = torch.mv(hidden_states_other[l], hidden_states[i])
# probs = torch.nn.functional.softmax(association)
probs = Variable(torch.ones(torch.numel(l)).cuda())
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val

It works without any error.


#8

Even this works perfectly:

hidden_states_weights[i] = torch.sum(hidden_states_other[l], 0)

I can’t understand why this wouldn’t be an in-place operation, but the other assignments in the above posts are.

Is there a good way to debug which operation is deemed as in-place and thereby, resulting in the error?


(James Bradbury) #9

I’m not sure exactly what’s happening here. Maybe @apaszke can help?


#10

New update: If I clone this tensor, then I don’t get that error anymore

association = torch.mv(hidden_states_other[l], hidden_states[i].clone())

I have no idea how this resolved the error
I know clone() detaches the gradient, hence I can’t use this. I am not sure if this was helpful.


(James Bradbury) #11

Clone does not detach the gradient. Only .detach does that.


#12

So, why does clone resolve the error? I am still unsure how that operation makes it not an in-place operation.


(Alban D) #13

Because clone creates a new version of the tensor with its own memory.
So inplace operations in the cloned tensor are not inplace in the original one. Backpropagation-wise, the clone does not change anything as the gradients will be passed to the original tensor unchanged.