Why pytorch consumes much more memory with small memory allocation?

Hi,

I am running pytorch with a simple cuda allocation.

import torch
a=torch.zeros((1,3,352,640), dtype=torch.float32).cuda()

The amount of memory should be (13352*640 / 1000000) = 2.7MB

However, with nvidia-smi, it costs about 591MB.

And also the cpu memory is also increased by 1.2G.

this is unacceptable when running on edge device.

any idea?

The majority of the memory allocation on the GPU are used by the CUDA context, which is created during the first CUDA operation. You could lower is by building from source for your GPU architecture and remove unnecessary libs (such as NCCL in case you don’t need it).

please see Building pytorch wheel with cmake