Torch.cuda.empty_cache() will occupy extra memory on GPU 0

dreamfly · September 9, 2019, 2:14pm

Hi, anyone who cares.

I was aware of the functionality of torch.cuda.empty_cache() that calling this function can release the GPU memory which is no longer bound to a python variable but still in the memory pool. However, when I place the model in any GPU other than GPU 0 and call torch.cuda.empty_cache(), besides releasing memory on the specified GPU, about 700MB memory of GPU 0 will be occupied. It confused me so much and I can’t figure out why. Does anyone know about the internal machenism of this function and can explain why the strage memory comes out?

FYI, the approach of placing model to a certain GPU was specifying device index in pytorch code, rather than setting environment variable ‘CUDA_VISIBLE_DEVICES’ outside.

albanD · September 9, 2019, 10:41pm

Hi,

If this is a one time thing, then this is most likely because it initializes the cuda context on GPU 0.
We discourage the use of this function unless you do have a very good reason to release that memory. Could you explain your usecase for needing to clear the cache and not use CUDA_VISIBLE_DEVICES ?

dreamfly · September 10, 2019, 1:55am

Hi,

Thanks for your quick reply and explaination! I agree with you that the extra memory occupation is due to the cuda context being initialized on GPU 0. However, I was wondering if there was a solution that allowed me to specify which GPU to initialize the cuda context.

The usecase of torch.cuda.empty_cache() is that I need to call a subprocess to achieve an exhaustive test after 10 epoch training. Releasing GPU cached memory allows this subprocess to utilize them and avoid OOM.

Using environment variables is very convenient. As for CUDA_VISIBLE_DEVICES, I’m sure that CUDA_VISIBLE_DEVICES can save me from such a circumstance for now. However, it will modify the environment variable, and I may need other processes to perform some deeplearning tasks on other GPUs with potentially different framework, like TensorFlow. As a result, I’d like to find a way than can constrain the GPU utility locally, trying not to mess up with the environment variables.

albanD · September 10, 2019, 1:53pm

Hi,

This sounds like a completely valid use case for empty_cache() !

For the env variable, you can set it temporarily for a single command by doing:
CUDA_VISIBLE_DEVICES=2 python your_command.py
This way, only this python command will have the env variable defined.