a = torch.tensor([-2, -1, .0, 1, 2], dtype=torch.float, requires_grad=True) b = torch.tensor([1., 2, 3, 4, 5], dtype=torch.float, requires_grad=True)
relu = nn.ReLU(inplace=True) c = a * 2 c *= b#the gradient of vector b relies on the value of c c = relu(c)
y = torch.sum(c)
c.register_hook(print)
y.backward()
print(“b.grad:”, b.grad)

The gradient of vector b replies on the values of vector c, and c has two in-place operations (multiply and Relu). In my opinion this code should fail because c has changed after c *= b. However, this code works fine since it can correctly calculate the gradient of b.

Therefore, could somebody explain to me which part do misunderstand?

when doing backward, autograd checks that all tensors it needs have not been modified since it recorded them. not all tensors are needed in backward, so some inplace ops may be fine.

Apparently, the gradient of b does rely on the value of c, right?
but afterwards, due to both the in-place multiplication and the in-place relu operation on c, c has its value changed, right?
Then autograd should have failed to compute the gradient of b, right?

If an inplace op that need grad is done and the modified input is needed for backward, a copy of it is saved. e.g., the mul_ has this in VariableType.cpp

if (compute_requires_grad( self, other )) {
grad_fn = std::shared_ptr<MulBackward0>(new MulBackward0(), deleteFunction);
grad_fn->set_next_edges(collect_next_edges( self, other ));
grad_fn->self_ = SavedVariable(self.clone(), false);
grad_fn->other_ = SavedVariable(other, false);
}