I’m trying to profile the memory used by PyTorch for the forward and backward pass of one minibatch for various CNN layers. When doing so I found that if a certain layer configuration doesn’t fit on the GPU and it throws an out of memory error, the GPU memory does not fully get freed and so if I continue executing with new layer configurations eventually the GPU completely runs out of memory.
I was wondering if there was a way to deallocate all the tensors put on by PyTorch for that layer so the only memory remaining is the default CUDA context?