I noticed that during testing the
grad would still be true for the tensors we got from the models as outputs. Would using
detach() reduce the memory usage, or give any benefit?
The code during testing can be like:
for batch_idx, batch in enumerate(test_dataloader):
output = my_model(imgs).detach()
The usage of
detach() won’t save additional memory, since
output is not attached to anything.
If you try to print the output, you’ll see that the
grad_fn is missing, so that
detach() won’t change anything.
The gradients (from the previous run) might still be allocated as
torch.no_grad() avoids storing intermediate activations, which would be needed for the backward pass.