How to release allocated CUDA memory

blackyang · November 17, 2017, 1:02am

The actual size of a batch in my case can be different, and sometimes a batch would cause CUDA out of memory error. Interestingly during training, the CUDA memory usage keeps increasing instead of fluctuating (my guess is that the model would allocate memory based on the largest tensor so far, and doesn’t release very soon), so once I got a CUDA out of memory error, I can never go back to normal training. Is there any ways that I can manually release allocated CUDA memory so that I can use try...catch to handle out of memory error?

Thanks

colesbury · November 17, 2017, 1:21am

What version of PyTorch are you using? print(torch.__version__)
That doesn’t sound like it should be happening even when your batch sizes change. Can you share a repro or send a link to your code?

blackyang · November 17, 2017, 4:08pm

Thanks for your reply. This happens for both v0.2.0 and v0.4.x (I installed from source yesterday). I will try to have a minimal example

blackyang · November 17, 2017, 4:59pm

correction: it seems that v0.4.x doesn’t have this problem. I will double check it