Confused about in-place operation in .backward()

I encounter this problem when trying to assign value for tensor like below (RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation) . Notice that my in-place operation is before calculate loss = (b*b).mean().

import torch
print(torch.__version__)  # 1.8.1
torch.autograd.set_detect_anomaly(True)
a = torch.rand(3, requires_grad=True)
b = torch.sigmoid(a)
b[0] = 1
loss = (b*b).mean()
b.retain_grad()
loss.backward()  # RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation...

After that, I found that if replace torch.sigmoid() step by step like below, the .backward() could work successfully.

import torch
print(torch.__version__)  # 1.8.1
torch.autograd.set_detect_anomaly(True)
a = torch.rand(3, requires_grad=True)
# b = torch.sigmoid(a)
b = 1 / (1+torch.exp(-a))
b[0] = 1
loss = (b*b).mean()
b.retain_grad()
loss.backward()  # success
print(b.grad)  # tensor([0.6667, 0.4498, 0.4630])
print(a.grad)  # tensor([0.0000, 0.0987, 0.0982])

In addition, I also found that using .index_fill() like below, the code could also work.

import torch
print(torch.__version__)  # 1.8.1
torch.autograd.set_detect_anomaly(True)
a = torch.rand(3, requires_grad=True)
b = torch.sigmoid(a)
# b[0] = 1
b = b.index_fill(0, torch.LongTensor([0]), 1)
loss = (b*b).mean()
b.retain_grad()
loss.backward()  # success
print(b.grad)  # tensor([0.6667, 0.4498, 0.4630])
print(a.grad)  # tensor([0.0000, 0.0987, 0.0982])

In conclusion, I have the following questions:

  1. Why is there in-place error with using torch.sigmoid()? But the step by step has no problem.
  2. Why after using index_fill the code could work?

Thanks for reply!

1 Like
  1. because torch.sigmoid saves its output to use in the backward computation, modifying the output of sigmoid “b” in-place will result in an error. If you split torch.sigmoid such that the last out of place operation is a division, that actually saves its input for backward, so it doesn’t care if you modified the output in-place.
  2. index_fill is an out-of-place op, when you do b = b.index_fill(...) you are just changing what object “b” points to.
  1. Why does output = torch.sigmoid(input) design this way? Actually it can do backward computation without using output (by using just input).
  2. In my opinion, the reason for we can’t using in-place op is that the variable value changes between forward and backward. So why out-of-place op is ok for this situation? Maybe even we have b = b.index_fill(...), the previous b (without index_fill) is still saved in the computation graph?
  1. its convenient to use the output because sigmoid’s derivative is sigmoid(x)(1 - sigmoid(x))
  2. yes the reason we can’t use in-place op is because you are modifying the saved variable in-place. If you do an out-of-place op here, yes the previous b is still saved in the computation graph, except now you are creating a new tensor object, and pointing b to the new tensor object. The saved variable shares storage with the old tensor object, so it is fine.

I understand. You help me a lot, thanks!

1 Like