I recently noticed an interesting scenario that led me to ask this question. I have a float tensor containing 3 elements. Upon moving this tensor to GPU, the nvidia-smi
reports 1089MiB memory consumption. Please see below ipython
notebook more details:
$ ipython
Python 3.9.13 (main, Aug 25 2022, 23:26:10)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import torch, sys
In [2]: torch.__version__
Out[2]: '1.9.0+cu102'
In [3]: torch.cuda.is_available()
Out[3]: True
In [4]: a = torch.ones((1, 3))
In [5]: a.to("cuda:0")
Out[5]: tensor([[1., 1., 1.]], device='cuda:0')
In [6]: a
Out[6]: tensor([[1., 1., 1.]])
In [7]: a.dtype
Out[7]: torch.float32
In [8]: # Source: https://discuss.pytorch.org/t/how-to-know-the-memory-allocated-for-a-tensor-on-gpu/28537/2
In [9]: a.element_size() * a.nelement()
Out[9]: 12
In [10]: # Source: https://stackoverflow.com/a/54365012
In [11]: sys.getsizeof(a.storage())
Out[11]: 68
Theoretically, a C-based array of datatype float having 3 elements should take (4 bytes per float * 3 floats =) 12 bytes of memory. However, I understand that a tensor requires extra metadata. Therefore, the tensor will consume more memory than the C-based array.
Anyway, based on the two references [1] and [2], I computed the size of the allocated memory, but as you have noticed already, they are not equal, i.e. , 12 != 68.
On the other hand, when I moved this tensor to GPU, the nvidia-smi
reports 1089MiB memory consumption. Below is the output of the nvidia-smi
command:
$ nvidia-smi
Tue Oct 4 16:08:54 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... On | 00000000:01:00.0 On | N/A |
| N/A 50C P8 13W / N/A | 2513MiB / 7982MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1554 G /usr/lib/xorg/Xorg 160MiB |
| 0 N/A N/A 2820 G /usr/lib/xorg/Xorg 665MiB |
| 0 N/A N/A 3001 G /usr/bin/gnome-shell 105MiB |
| 0 N/A N/A 3614 G ...763400436228628087,131072 397MiB |
| 0 N/A N/A 39131 G ...RendererForSitePerProcess 78MiB |
| 0 N/A N/A 141097 C ...conda/envs/ray/bin/python 1089MiB |
+-----------------------------------------------------------------------------+
GPU memory is more precious than anything else in the world!!! This is why I can’t digest the memory usage reported by nvidia-smi
.
What’s wrong here?