I use the following code snippet to calculate attention weights and get the new hidden state input for my RNN.
association = torch.mv(hidden_states_other[l], hidden_states[i])
probs = torch.nn.functional.softmax(association)
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val
Running this, gives me the error
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
I don’t see any inplace operation. Can someone explain why this is happening?
hidden_states : N x R tensor
hidden_states_other : N x R tensor
l : M, tensor
hidden_states_weights : N x R tensor
1 Like
This is an in-place assignment. Most likely the best solution will be to accumulate the contents of hidden_states_weights
as a list
, then use torch.cat
or torch.stack
after the for
loop to combine it together into a Tensor
.
I tried that as well, but it results in the same error:
list_of_attention_val = []
for i in range(seq):
association = torch.mv(hidden_states_other[l], hidden_states[i])
probs = torch.nn.functional.softmax(association)
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
list_of_attention_val.append(attention_val)
hidden_states_weights = torch.stack(list_of_attention_val)
Huh, there don’t appear to be any remaining in-place ops in the code snippet you pasted. Maybe it’s somewhere else?
When I comment this line, I don’t get the error anymore.
Also, in the above snippet, if I replace the assignment with this
hidden_states_weights[i] = hidden_states_weights[i] + attention_val
it shouldn’t be an in-place operation anymore, the way I understand it. But, I still get the same error.
vvanirudh:
association = torch.mv(hidden_states_other[l], hidden_states[i])
probs = torch.nn.functional.softmax(association)
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val
It couldn’t have been that, since if I replace the above snippet with this
# association = torch.mv(hidden_states_other[l], hidden_states[i])
# probs = torch.nn.functional.softmax(association)
probs = Variable(torch.ones(torch.numel(l)).cuda())
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val
It works without any error.
Even this works perfectly:
hidden_states_weights[i] = torch.sum(hidden_states_other[l], 0)
I can’t understand why this wouldn’t be an in-place operation, but the other assignments in the above posts are.
Is there a good way to debug which operation is deemed as in-place and thereby, resulting in the error?
I’m not sure exactly what’s happening here. Maybe @apaszke can help?
vvanirudh:
association = torch.mv(hidden_states_other[l], hidden_states[i])
probs = torch.nn.functional.softmax(association)
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val
New update: If I clone this tensor, then I don’t get that error anymore
association = torch.mv(hidden_states_other[l], hidden_states[i].clone())
I have no idea how this resolved the error
I know clone()
detaches the gradient, hence I can’t use this. I am not sure if this was helpful.
jekbradbury
(James Bradbury)
April 20, 2017, 10:52pm
11
Clone does not detach the gradient. Only .detach
does that.
1 Like
So, why does clone
resolve the error? I am still unsure how that operation makes it not an in-place operation.
albanD
(Alban D)
April 21, 2017, 8:37am
13
Because clone creates a new version of the tensor with its own memory.
So inplace operations in the cloned tensor are not inplace in the original one. Backpropagation-wise, the clone does not change anything as the gradients will be passed to the original tensor unchanged.
2 Likes
rajarsheem
(Rajarshee Mitra)
January 23, 2019, 6:48am
14
Similar issue with me. Here is my snippet:
self.scorer = nn.Linear(hidden_dim, 1)
mat = []
for i in range(seq2_len):
temp = []
for j in range(seq1_len):
seq2_state = sequence2_[:,i,:]
seq1_state = sequence1_[:,j,:]
diff = self.scorer(T.abs(seq1_state - seq2_state)).squeeze()
temp.append(diff)
temp = T.stack(temp)
mat.append(temp)
mat = T.stack(mat)
mat = T.transpose(T.transpose(mat, 2, 0), 2, 1)
Can you please help me?
1 Like
rajarsheem
(Rajarshee Mitra)
January 23, 2019, 10:37am
15
@jekbradbury , @apaszke , @albanD can you please help me with this?