Is there any solution for this ? We work on a shared server, and sometimes I need to free gpu memory for other users without killing the whole kernel. Your code indeed frees the reserved memory (
torch.cuda.memory_reserved() returns 0), but
nvidia-smi still shows that my kernel is occupying the memory.
PS : I use jupyter-lab, that’s why sometimes I still need the kernel after that my model has finished training.
nvidia-smi will show the allocated memory by all processes. If you are only running PyTorch then the CUDA context would still use device memory (~1GB depending on the GPU etc.) and cannot be released without stopping the Python kernel.
Thank you for your reply. I am afraid that
nvidia-smi shows all the GPU memory that is occupied by my notebook. For instance, if I train a model that needs 15 GB of GPU memory, and that I free the space using torch (by following the procedure in your code) , the
torch.cuda.memory_reserved() will return 0, but
nvidia-smi would still show 15GB.
nvidia-smi indeed shows all allocated memory, so if it’s still showing 15GB then some applications are still using it. If you are not seeing any memory usage (either allocated or in the cache) via
torch.cuda.memory_summary(), another application (or python kernel) would use the device memory.