Hi,
I have difficulties to understand why this code does not throw error

model = torch.nn.Linear(512,2)
opt = torch.optim.SGD(model.parameters(), lr=0.1)
inputs = torch.rand((32,512))
outputs = torch.rand((32,2))
opt.zero_grad()
pred = model(inputs)
pred[:,1] = torch.sigmoid(pred[:,1])
loss = torch.nn.functional.mse_loss(pred, outputs)
loss.backward()
opt.step()

But if I replace torch.sigmoid by torch.nn.functional.relu it throws RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.
What is the hidden reason explaining the difference between the two activation functions ?

I’m guessing … (But it’s the weekend, so let me guess away!)

At issue is not only whether you are using an inplace operation,
but also whether the tensor modified by the inplace operation is
used directly in the backward() computation.

The derivative of relu() depends on the sign of its argument
(your pred). So backward() might do something like:

if pred > 0.0: deriv = 1.0
else: deriv = 0.0

and complain if you have modified pred inplace.

But the derivative of sigmoid() can be expressed simply in terms
of sigmoid() itself, so I imagine that sigmoid()'s forward()
saves the result of sigmoid() in the context (ctx) that it will use in
its backward(), and having this information obviates the need to use
the original argument to sigmoid() in the backward() computation
(and it’s smart enough not to complain if it doesn’t need it).