When I call cuda(1) on a FloatTensor, this results in taking up memory on all available GPUs. I specify that the tensor should be created on the GPU 1 but this has no effect.
I ran the code:
import torch
if __name__ == '__main__':
a = torch.FloatTensor(10, 20).cuda(1)
print('Done!!!')
And the result of nvidin-smi is:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1281 G /usr/lib/xorg/Xorg 37MiB |
| 0 1374 G /usr/bin/gnome-shell 49MiB |
| 0 2712 G /usr/lib/xorg/Xorg 317MiB |
| 0 2880 G /usr/bin/gnome-shell 235MiB |
| 0 2892 G /opt/teamviewer/tv_bin/TeamViewer 2MiB |
| 0 3280 G ...-token=00C0BADC3EA2964E642C4E1639C06C6D 80MiB |
| 0 3458 G ...-token=0700AC55763E12A78BD21197AA676191 165MiB |
| 0 7932 C ...n/.virtualenvs/torch_gpu_0.4/bin/python 549MiB |
| 1 7932 C ...n/.virtualenvs/torch_gpu_0.4/bin/python 549MiB |
+-----------------------------------------------------------------------------+
Is this not the right way to select GPU? Also, if I specify CUDA_VISIBLE_DEVICES=1, then this works (I have to specify a GPU-id of 0, then).
In the post, it is mentioned that pytorch creates a context that should be close to 250MB irrespective of the size of the variables I create. In my case, it seems like the same variable is being created on both the GPUs. If you look at the output of nvidia-smi, the process 7932 takes up 549MB on both GPUs. Is this the intended behavior?
Thanks Roy. I did what you suggested. Below is the output of nvidia-smi:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1324 G /usr/lib/xorg/Xorg 37MiB |
| 0 1419 G /usr/bin/gnome-shell 50MiB |
| 0 1725 G /usr/lib/xorg/Xorg 338MiB |
| 0 1893 G /usr/bin/gnome-shell 190MiB |
| 0 1905 G /opt/teamviewer/tv_bin/TeamViewer 21MiB |
| 0 2285 G ...-token=E170E820957801A9C1BEAF1C38D248C6 45MiB |
| 0 2630 G ...-token=66DB471AF199475291E8749402E957B5 153MiB |
| 0 21478 C ...n/.virtualenvs/torch_gpu_0.4/bin/python 549MiB |
| 1 6413 C ...n/.virtualenvs/torch_gpu_0.4/bin/python 703MiB |
| 1 21041 C ...n/.virtualenvs/torch_gpu_0.4/bin/python 703MiB |
+-----------------------------------------------------------------------------+
The process with PID 21478 is the one that was created as a result of a = torch.tensor([], device=‘cuda:0’). That is a lot of memory for an empty tensor!!
I don’t think we can really do much about the 550MB, but I can’t reproduce the original issue of taking memory on all GPUs on master (I think someone fixed it at some point, but I can’t find the PR for some reason).
I am guessing that means this issue will be fixed in the next stable version of pytorch. I am using 0.4 right now (not on master). Anyways, thanks for looking into the issue.