What the difference between detach(), detach_(),and with torch.no_grad() in our training phase?

ptrblck · December 18, 2019, 4:56am

Your assumptions are correct.
with torch.no_grad() won’t track the wrapped operations by Autograd, so that intermediate tensors won’t be stored, which would be needed for the backward pass.
That being said, you should not wrap the complete forward pass in this block during training, as you won’t be able to calculate the gradients.

detach() operates on a tensor and returns the same tensor, which will be detached from the computation graph at this point, so that the backward pass will stop at this point.
detach_() is the inplace operation of detach().