Calling cuda(1) on a FloatTensor takes up memory on all GPUs

Hi,

When I call cuda(1) on a FloatTensor, this results in taking up memory on all available GPUs. I specify that the tensor should be created on the GPU 1 but this has no effect.

I ran the code:

import torch

if __name__ == '__main__':
    a = torch.FloatTensor(10, 20).cuda(1)

    print('Done!!!')

And the result of nvidin-smi is:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1281      G   /usr/lib/xorg/Xorg                            37MiB |
|    0      1374      G   /usr/bin/gnome-shell                          49MiB |
|    0      2712      G   /usr/lib/xorg/Xorg                           317MiB |
|    0      2880      G   /usr/bin/gnome-shell                         235MiB |
|    0      2892      G   /opt/teamviewer/tv_bin/TeamViewer              2MiB |
|    0      3280      G   ...-token=00C0BADC3EA2964E642C4E1639C06C6D    80MiB |
|    0      3458      G   ...-token=0700AC55763E12A78BD21197AA676191   165MiB |
|    0      7932      C   ...n/.virtualenvs/torch_gpu_0.4/bin/python   549MiB |
|    1      7932      C   ...n/.virtualenvs/torch_gpu_0.4/bin/python   549MiB |
+-----------------------------------------------------------------------------+

Is this not the right way to select GPU? Also, if I specify CUDA_VISIBLE_DEVICES=1, then this works (I have to specify a GPU-id of 0, then).

Thanks,
Shirin

Pytorch will create context on gpu 0, even if you don’t use it. See https://github.com/pytorch/pytorch/issues/3477 for more details.

In the post, it is mentioned that pytorch creates a context that should be close to 250MB irrespective of the size of the variables I create. In my case, it seems like the same variable is being created on both the GPUs. If you look at the output of nvidia-smi, the process 7932 takes up 549MB on both GPUs. Is this the intended behavior?

I don’t think (10, 20) tensor shouldn’t be taking 300MB. The 250MB number might not be for all models.

Can you see how much memory is being used if you just do a = torch.tensor([], device='cuda:0')?

Thanks Roy. I did what you suggested. Below is the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1324      G   /usr/lib/xorg/Xorg                            37MiB |
|    0      1419      G   /usr/bin/gnome-shell                          50MiB |
|    0      1725      G   /usr/lib/xorg/Xorg                           338MiB |
|    0      1893      G   /usr/bin/gnome-shell                         190MiB |
|    0      1905      G   /opt/teamviewer/tv_bin/TeamViewer             21MiB |
|    0      2285      G   ...-token=E170E820957801A9C1BEAF1C38D248C6    45MiB |
|    0      2630      G   ...-token=66DB471AF199475291E8749402E957B5   153MiB |
|    0     21478      C   ...n/.virtualenvs/torch_gpu_0.4/bin/python   549MiB |
|    1      6413      C   ...n/.virtualenvs/torch_gpu_0.4/bin/python   703MiB |
|    1     21041      C   ...n/.virtualenvs/torch_gpu_0.4/bin/python   703MiB |
+-----------------------------------------------------------------------------+

The process with PID 21478 is the one that was created as a result of a = torch.tensor([], device=‘cuda:0’). That is a lot of memory for an empty tensor!!

I don’t think we can really do much about the 550MB, but I can’t reproduce the original issue of taking memory on all GPUs on master (I think someone fixed it at some point, but I can’t find the PR for some reason).

I am guessing that means this issue will be fixed in the next stable version of pytorch. I am using 0.4 right now (not on master). Anyways, thanks for looking into the issue.