How in-place operation affect Autograd

ChanggongZhang · December 3, 2018, 4:50pm

Here are my simple codes:

a = torch.tensor([-2, -1, .0, 1, 2], dtype=torch.float, requires_grad=True)
b = torch.tensor([1., 2, 3, 4, 5], dtype=torch.float, requires_grad=True)
relu = nn.ReLU(inplace=True)
c = a * 2
c *= b #the gradient of vector b relies on the value of c
c = relu(c)
y = torch.sum(c)
c.register_hook(print)
y.backward()
print(“b.grad:”, b.grad)

The gradient of vector b replies on the values of vector c, and c has two in-place operations (multiply and Relu). In my opinion this code should fail because c has changed after c *= b. However, this code works fine since it can correctly calculate the gradient of b.

Therefore, could somebody explain to me which part do misunderstand?

Best,
Changgong

SimonW · December 12, 2018, 2:26am

when doing backward, autograd checks that all tensors it needs have not been modified since it recorded them. not all tensors are needed in backward, so some inplace ops may be fine.

ChanggongZhang · December 12, 2018, 4:06am

Apparently, the gradient of b does rely on the value of c, right?
but afterwards, due to both the in-place multiplication and the in-place relu operation on c, c has its value changed, right?
Then autograd should have failed to compute the gradient of b, right?

did i misunderstand something here?

SimonW · December 12, 2018, 6:04am

If an inplace op that need grad is done and the modified input is needed for backward, a copy of it is saved. e.g., the mul_ has this in VariableType.cpp

  if (compute_requires_grad( self, other )) {
    grad_fn = std::shared_ptr<MulBackward0>(new MulBackward0(), deleteFunction);
    grad_fn->set_next_edges(collect_next_edges( self, other ));
    grad_fn->self_ = SavedVariable(self.clone(), false);
    grad_fn->other_ = SavedVariable(other, false);
  }