PyTorch does not use available GPU memory

I was trying to train a neural network that uses ResNet 152 as backbone but I was getting CUDA out of memory error. After that, I added the code fragment below to enable PyTorch to use more memory.

torch.cuda.set_per_process_memory_fraction(1., 0)

However, I am still not able to train my model despite the fact that PyTorch uses 6.06 GB of memory and tries to allocate 58.00 MiB where initally there are 7+ GB of memory unused in my GPU.

RuntimeError: CUDA out of memory. 
Tried to allocate 58.00 MiB (GPU 0; 7.80 GiB total capacity; 6.05 GiB already allocated; 
48.94 MiB free; 7.80 GiB allowed; 6.19 GiB reserved in total by PyTorch)

The result that I obtain after running nvidia-smi command.

| N/A   47C    P8     6W /  N/A |    362MiB /  7982MiB |     10%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A       947      G   /usr/lib/xorg/Xorg                 70MiB |
|    0   N/A  N/A      1549      G   /usr/lib/xorg/Xorg                159MiB |
|    0   N/A  N/A      1722      G   /usr/bin/gnome-shell               34MiB |
|    0   N/A  N/A      6506      G   ...AAAAAAAAA= --shared-files       85MiB |

How can I increase the 6.19 GiB memory reserved in total by PyTorch to use more memory
from my GPU? Thank you!

OS: Ubuntu 20.04

GPU: Nvidia GeForce RTX 2070-SUPER Max Q Super Design

PyTorch version: 1.8.1+cu111

Cuda toolkit: 11.2

Nvidia Cuda driver: 460.80

Beyond killing the desktop process (only advisable if you are using the machine in a headless way e.g., over ssh), I’m not sure there is much more you can do. If you want to do this you can try something like sudo service gdm stop.

But why? For instance, while the model is training, I am able to load another model from a jupyter kernel to see some predictions which takes approximately another 1.3 GB of the GPU memory. At the end when I look at the GPU situation, I saw that 7.7 GB of GPU memory was being used while the training and testing processes were running together. Because of this I think that in principle, I should be able to allocate more memory to training process.

At least 800MiB of GPU memory will be used for PyTorch’s native GPU kernels (happens when you call .cuda() on a tensor or layer with parameters). Then when you use a cuBLAS kernel for the first time (think matrix multiply on GPU), a hundred or so MiB will be used up by the cuBLAS libraries. A similar thing happens with cuDNN when kernels like convolution are called. By the time most of the kernels in a typical model are covered, easily over 1GiB of device memory will be used by code alone.

In summary, the GPU libraries containing the kernels used by PyTorch are fairly hefty, and they have to go somewhere (in this case device memory).