Some questions about CUDA usage

Hi, I wonder how to understand the CUDA usage about training neural networks based on GPU.

My question lies in the memory usage before and after training. Before training, I think only our model will occupy memories. After training, if we delete all the other data, the memories should not change a lot.

However, if I do not run torch.cuda.empty_cache(), the memory usage will be quite large. Even if I run this method, the memory usage is still large than the original memory occupy.

Can anyone please help me? Thanks a lot.

The behavior your are seeing is normal and expected because PyTorch uses a caching allocator for GPU memory. The caching allocator will often allocate more memory than is needed and hold on to it for longer than it is used during the execution of a PyTorch program because it can be expensive to call cudaMalloc and cudaFree repeatedly as tensors (such as inputs and model activations) are created and destroyed during training. By caching allocations, memory can be reused immediately. However, as you noticed, if you do not call empty_cache() it will still be held by the underlying Python process.

1 Like

OK, thanks a lot. After each training process, I will call this instruction.

The caching allocator will often allocate more memory than is needed and hold on to it for longer than it is used during the execution of a PyTorch program

How to get the exact GPU memory usage then? Is that torch.cuda.memory_allocated?

If you mean memory used by tensors, then yes torch.cuda.memory_allocated would be accurate. Note that it would not consider CUDA context memory usage or memory used by library kernels such as those in cuBLAS or cuDNN.

1 Like