Memory Inflation Without no_grad and.eval()

Im pretty new to PyTorch and working on my first training. I noticed that after an epoch of training when I did validation without going into eval mode and using the no_grad, GPU memory consumption goes sky high until there is no GPU memory left. It would be awesome if somebody could explain what is happening under the hood and causing such thing.

If you don’t use with torch.no_grad(), Autograd will create the computation graph, which is needed to calculate the gradients in the backward pass.
Now if you store a tensor, which is attached to this computation graph, such as the loss or model output, in e.g. a list, the whole computation graph will also be stored.
To avoid the increased memory usage, you could either wrap the validation code in torch.no_grad() or call tensor.detach() before storing it.

1 Like