All GPUs used when relocating tensor to CUDA

Hi, all

I noticed an issue with my current setup. When I simply execute the following code snippet, all GPUs will be occupied. Specifically, cuda 0 is where the data is relocated to. The other GPUs have zero usage, but are still displayed for some reason.

import torch
a = torch.rand(10)
b = a.cuda()

Below is an attached GPU usage stats returned from nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  On   | 00000000:04:00.0 Off |                  N/A |
| 22%   31C    P2    68W / 250W |    519MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  On   | 00000000:05:00.0 Off |                  N/A |
| 22%   27C    P8    15W / 250W |      4MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  On   | 00000000:08:00.0 Off |                  N/A |
| 22%   28C    P8    15W / 250W |      4MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  On   | 00000000:09:00.0 Off |                  N/A |
| 22%   26C    P8    15W / 250W |      4MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX TIT...  On   | 00000000:85:00.0 Off |                  N/A |
| 22%   27C    P8    14W / 250W |      4MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX TIT...  On   | 00000000:86:00.0 Off |                  N/A |
| 22%   27C    P8    15W / 250W |      4MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  GeForce GTX TIT...  On   | 00000000:89:00.0 Off |                  N/A |
| 22%   24C    P8    15W / 250W |      4MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  GeForce GTX TIT...  On   | 00000000:8A:00.0 Off |                  N/A |
| 22%   27C    P8    15W / 250W |      4MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     32155      C   python                            514MiB |
|    1   N/A  N/A     32155      C   python                              0MiB |
|    2   N/A  N/A     32155      C   python                              0MiB |
|    3   N/A  N/A     32155      C   python                              0MiB |
|    4   N/A  N/A     32155      C   python                              0MiB |
|    5   N/A  N/A     32155      C   python                              0MiB |
|    6   N/A  N/A     32155      C   python                              0MiB |
|    7   N/A  N/A     32155      C   python                              0MiB |
+-----------------------------------------------------------------------------+

Other information:
OS: Ubuntu 20.04.1 LTS
PyTorch: 1.7.1 (other versions seem to have the issue too)
CUDA version: 11.1
Driver Version: 455.32.00
Hardware: 8x GeForce GTX TITAN X

Could someone please let me know what is going on?

Many thanks,
Fred

The CUDA initialization will see all devices and might thus add the small amount of memory. You can hide the other devices by using CUDA_VISIBLE_DEVICES=0 python script.py args to make only certain GPUs visible inside your script.

Hi @ptrblck

Thanks for the reply. I’m actually using DistributedDataParallel. And because of this issue, each spawned process will occupy a small portion of memory on every GPU, which is causing the whole process to hang.

If you are using the recommended mode of one GPU per process, you could use the launch scripts to make sure each process only sees a single GPU.

Yeah I’m using one GPU per process. But the code is written in a way that the processes spawn in the __main__ function. Although it wouldn’t be hard to modify it to use the launch script, I just really want to know why all CUDA devices are used by one process. It used to be fine, only cuda:0 is occupied when the device id is not specified. But after a system update, I re-installed CUDA and PyTorch (same version). Now it’s behaving like this. It seems very peculiar to me.

I found the problem. It’s mostly likely a display bug for cuda driver 455.32.00. I upgraded to CUDA11.2 with driver version 460.27.04 and the everything is fine now.

1 Like

I’m facing the same problem.

Did you mean nvidia graphic driver?

Yes. Simply upgrading the CUDA toolkit to 11.2 will automatically upgrade the driver too.