More GPU Memory Utilization on 2080Ti

Hi, all. I have trained a detection model on 1080Ti, with PyTorch 1.0.1. The GPU memory in inference is about 500M. However, when I run the same model on 2080Ti, the GPU memory increases to about 850M. What is the reason and how can I reduce the extra ~350M memory in 2080Ti?

Could you create a single torch.randn(1).cuda() tensor and check the memory usage via nvidia-smi?
If the ~350MB are visible, they are coming from the CUDA context and you cannot reduce it.
The size of the context depends on the CUDA version, used device etc.

Hi, in my case, it costs 345MB memory on 1080Ti and 477MB on 2080Ti.

I also find another case: 2080ti cost more memory than 1080ti?, which says that it costs 357MB memory on 1080Ti and 471MB on 2080Ti.

Moreover, when I just load model without input image, the memory costs on 1080 and 2080 already have ~300M gap.

The size of the CUDA context depends on the CUDA version as well as the device, so it’s unfortunately expected.

In other words, this is a normal phenomenon that we cannot avoid?
And thank you very much for above detailed answers : )

Yes, you cannot avoid the overhead of the CUDA context unfortunately. :frowning:

OK, understand.
Thanks again for your help ~

1 Like

How can I decrease the overhead of the CUDA context ?

One possibility would be to remove specific libraries, such as NCCL, cudnn etc., which should reduce the context, as less code has to be stored.
Alternatively, you could try to remove PyTorch CUDA kernels, but this sounds pretty painful and you could be running in a lot of failures for undefined functions.

Thanks for your anwser.
How can I remove specific libraries?Is that the right way to build pytorch from source witout cudnn ?

Yes, you could specify the env vars (e.g. USE_CUDNN=0) and build PyTorch from source.