Confused about in-place operation in .backward()

njucckevin · October 11, 2021, 1:54pm

I encounter this problem when trying to assign value for tensor like below (RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation) . Notice that my in-place operation is before calculate loss = (b*b).mean().

import torch
print(torch.__version__)  # 1.8.1
torch.autograd.set_detect_anomaly(True)
a = torch.rand(3, requires_grad=True)
b = torch.sigmoid(a)
b[0] = 1
loss = (b*b).mean()
b.retain_grad()
loss.backward()  # RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation...

After that, I found that if replace torch.sigmoid() step by step like below, the .backward() could work successfully.

import torch
print(torch.__version__)  # 1.8.1
torch.autograd.set_detect_anomaly(True)
a = torch.rand(3, requires_grad=True)
# b = torch.sigmoid(a)
b = 1 / (1+torch.exp(-a))
b[0] = 1
loss = (b*b).mean()
b.retain_grad()
loss.backward()  # success
print(b.grad)  # tensor([0.6667, 0.4498, 0.4630])
print(a.grad)  # tensor([0.0000, 0.0987, 0.0982])

In addition, I also found that using .index_fill() like below, the code could also work.

import torch
print(torch.__version__)  # 1.8.1
torch.autograd.set_detect_anomaly(True)
a = torch.rand(3, requires_grad=True)
b = torch.sigmoid(a)
# b[0] = 1
b = b.index_fill(0, torch.LongTensor([0]), 1)
loss = (b*b).mean()
b.retain_grad()
loss.backward()  # success
print(b.grad)  # tensor([0.6667, 0.4498, 0.4630])
print(a.grad)  # tensor([0.0000, 0.0987, 0.0982])

In conclusion, I have the following questions:

Why is there in-place error with using torch.sigmoid()? But the step by step has no problem.
Why after using index_fill the code could work?

Thanks for reply!

soulitzer · October 12, 2021, 12:34am

because torch.sigmoid saves its output to use in the backward computation, modifying the output of sigmoid “b” in-place will result in an error. If you split torch.sigmoid such that the last out of place operation is a division, that actually saves its input for backward, so it doesn’t care if you modified the output in-place.
index_fill is an out-of-place op, when you do b = b.index_fill(...) you are just changing what object “b” points to.

njucckevin · October 12, 2021, 1:38am

Why does output = torch.sigmoid(input) design this way? Actually it can do backward computation without using output (by using just input).
In my opinion, the reason for we can’t using in-place op is that the variable value changes between forward and backward. So why out-of-place op is ok for this situation? Maybe even we have b = b.index_fill(...), the previous b (without index_fill) is still saved in the computation graph?

soulitzer · October 13, 2021, 1:40am

its convenient to use the output because sigmoid’s derivative is sigmoid(x)(1 - sigmoid(x))
yes the reason we can’t use in-place op is because you are modifying the saved variable in-place. If you do an out-of-place op here, yes the previous b is still saved in the computation graph, except now you are creating a new tensor object, and pointing b to the new tensor object. The saved variable shares storage with the old tensor object, so it is fine.

njucckevin · October 13, 2021, 5:14am

I understand. You help me a lot, thanks!