I use the following code snippet to calculate attention weights and get the new hidden state input for my RNN.

```
association = torch.mv(hidden_states_other[l], hidden_states[i])
probs = torch.nn.functional.softmax(association)
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val
```

Running this, gives me the error

```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
```

I don’t see any inplace operation. Can someone explain why this is happening?

```
hidden_states : N x R tensor
hidden_states_other : N x R tensor
l : M, tensor
hidden_states_weights : N x R tensor
```

1 Like

This is an in-place assignment. Most likely the best solution will be to accumulate the contents of `hidden_states_weights`

as a `list`

, then use `torch.cat`

or `torch.stack`

after the `for`

loop to combine it together into a `Tensor`

.

I tried that as well, but it results in the same error:

```
list_of_attention_val = []
for i in range(seq):
association = torch.mv(hidden_states_other[l], hidden_states[i])
probs = torch.nn.functional.softmax(association)
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
list_of_attention_val.append(attention_val)
hidden_states_weights = torch.stack(list_of_attention_val)
```

Huh, there don’t appear to be any remaining in-place ops in the code snippet you pasted. Maybe it’s somewhere else?

When I comment this line, I don’t get the error anymore.

Also, in the above snippet, if I replace the assignment with this

```
hidden_states_weights[i] = hidden_states_weights[i] + attention_val
```

it shouldn’t be an in-place operation anymore, the way I understand it. But, I still get the same error.

vvanirudh:

association = torch.mv (hidden_states_other[l], hidden_states[i])
probs = torch.nn.functional.softmax(association)

attention_val = torch.mv (torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val

It couldn’t have been that, since if I replace the above snippet with this

```
# association = torch.mv(hidden_states_other[l], hidden_states[i])
# probs = torch.nn.functional.softmax(association)
probs = Variable(torch.ones(torch.numel(l)).cuda())
attention_val = torch.mv(torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val
```

It works without any error.

Even this works perfectly:

```
hidden_states_weights[i] = torch.sum(hidden_states_other[l], 0)
```

I can’t understand why this wouldn’t be an in-place operation, but the other assignments in the above posts are.

Is there a good way to debug which operation is deemed as in-place and thereby, resulting in the error?

I’m not sure exactly what’s happening here. Maybe @apaszke can help?

vvanirudh:

association = torch.mv (hidden_states_other[l], hidden_states[i])
probs = torch.nn.functional.softmax(association)

attention_val = torch.mv (torch.t(hidden_states_other[l]), probs)
hidden_states_weights[i] = attention_val

New update: If I clone this tensor, then I don’t get that error anymore

```
association = torch.mv(hidden_states_other[l], hidden_states[i].clone())
```

I have no idea how this resolved the error
I know `clone()`

detaches the gradient, hence I can’t use this. I am not sure if this was helpful.

jekbradbury
(James Bradbury)
April 20, 2017, 10:52pm
#11
Clone does not detach the gradient. Only `.detach`

does that.

1 Like

So, why does `clone`

resolve the error? I am still unsure how that operation makes it not an in-place operation.

albanD
(Alban D)
April 21, 2017, 8:37am
#13
Because clone creates a new version of the tensor with its own memory.
So inplace operations in the cloned tensor are not inplace in the original one. Backpropagation-wise, the clone does not change anything as the gradients will be passed to the original tensor unchanged.

2 Likes

rajarsheem
(Rajarshee Mitra)
January 23, 2019, 6:48am
#14
Similar issue with me. Here is my snippet:

```
self.scorer = nn.Linear(hidden_dim, 1)
mat = []
for i in range(seq2_len):
temp = []
for j in range(seq1_len):
seq2_state = sequence2_[:,i,:]
seq1_state = sequence1_[:,j,:]
diff = self.scorer(T.abs(seq1_state - seq2_state)).squeeze()
temp.append(diff)
temp = T.stack(temp)
mat.append(temp)
mat = T.stack(mat)
mat = T.transpose(T.transpose(mat, 2, 0), 2, 1)
```

Can you please help me?

1 Like

rajarsheem
(Rajarshee Mitra)
January 23, 2019, 10:37am
#15
@jekbradbury , @apaszke , @albanD can you please help me with this?