I am running pytorch with a simple cuda allocation.
import torch a=torch.zeros((1,3,352,640), dtype=torch.float32).cuda()
The amount of memory should be (13352*640 / 1000000) = 2.7MB
However, with nvidia-smi, it costs about 591MB.
And also the cpu memory is also increased by 1.2G.
this is unacceptable when running on edge device.