Profiling memory consumption of forward and backward pass

adi8862 · May 19, 2020, 5:20pm

I’m trying to profile the memory used by PyTorch for the forward and backward pass of one minibatch for various CNN layers. When doing so I found that if a certain layer configuration doesn’t fit on the GPU and it throws an out of memory error, the GPU memory does not fully get freed and so if I continue executing with new layer configurations eventually the GPU completely runs out of memory.

I was wondering if there was a way to deallocate all the tensors put on by PyTorch for that layer so the only memory remaining is the default CUDA context?

ptrblck · May 20, 2020, 8:07am

Do you have a reproducible code snippet for this behavior?
If PyTorch encounters an OOM, it should delete the current allocation, clear the cache and retry the allocation.