Hello,
I am trying to train a model in the following fashion:
# in a training loop, getting a batch x and a target y_true
output = model(x)
output_processed = process_output(output)
loss = criterion(output_processed, y_true)
loss.backward()
optimizer.step()
I am facing issues implementing the process_output
function. Some precisions on the context:
- my
output
tensor is of shape(batch_size, output_size)
- my
output_processed
tensor is of shape(batch_size, some_other_size)
wheresome_other_size > output_size
- the internals of process_output use slicing to assign values, for instance
output_processed[:, (0, 1, 2)] = output[:, (0, 1, 2)] + output[:, (3, 4, 5)]
(operations may be more complex but they all consist in torch elementary operations + slicing & broadcasting tricks)
Now, here is the issue I’m facing. I want to update the model’s weights using the loss computed from processed_output
, so process_output
begins like this:
def process_output(output: Tensor) -> Tensor:
output_processed = (
torch.empty(size=(output.shape[0], some_other_size))
).to(output.device, output.dtype).requires_grad_(True)
# perform operations: this raises RuntimeError
output_processed[:, (0, 1, 2)] = output[:, (0, 1, 2)] * 2
# ... other operations of the sort ...
return output_processed
As commented in the function above, the first operation fails and raises RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
I am not sure how to proceed here because I don’t think I understand requires_grad
and leaf variables correctly.
From my understanding, setting requires_grad_(False)
when instantiating the tensor should “break” my gradient computation, that is calling loss.backward()
will result in a wrong update of the model’s weights because I’m not “tracking” the gradients in output_processed
. However, a user replied on a similar post that he managed to correctly update his model: Leaf variable was used in an inplace operation - #14 by nima_rafiee.
The more I read on leaf variables, the more I’m confused. The docs for torch.Tensor.is_leaf say that “Tensors that have requires_grad
which is False
will be leaf Tensors”, but then say “Only leaf Tensors will have their grad
populated during a call to backward()
”. So that means that you say “that Tensor does not track gradients”, but then that Tensor is a leaf Tensor so it has its gradient populated ?
So, should I set requires_grad_(True)
for output_processed
or not ? If yes, would creating different Tensors from the operations in process_output
and then concatenating them at the end, rather than slicing an already instantiated Tensor, be a good solution to deal with the error ? Thanks !