Model.cuda() takes larger memory usage than the model size

Hi,

I’ve just run nothing but below,

>>> import torch
>>>m = torch.Linear(10,10)
>>>m.cuda()

and I’ve checked the model’s size in MiB by

>>>sum([p.numel() * p.element_size() for p in m.parameters()]) / 1024 / 1024
0.00041961669921875

but when nvidia-smi says,

Wed Aug 18 00:57:01 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00   Driver Version: 450.142.00   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   67C    P0    41W /  70W |   1072MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     28553      C   python                           1069MiB |
+-----------------------------------------------------------------------------+

Why the model takes more than its size, when it moved to cuda?

Oh, am I supposed to use torch.cuda.memory_allocated() ?

The result shows 1024bytes.

The reported memory by nvidia-smi shows the allocated memory by PyTorch as well as the CUDA context, which could use between ~600-1000MB depending on the GPU, CUDA version etc.