Question about GPU memory usage

Hello, I am working on training different models on the four gpus I have, and there is something I do not understand in the display from nvidia-smi.
Here is what I get :

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0      7126      C   python                                      5773MiB |
|    0      8866      C   python                                       343MiB |
|    0      9130      C   python                                       343MiB |
|    1      8866      C   python                                      5773MiB |
|    2      9130      C   python                                      5773MiB |

I am currently using only 3 gpus, but what I don’t understand is why there seems to be an extra python process on the first gpu (0), using 343 MiB. They appear only if I’m also training models on the 1 and 2 gpus, not if I’m only using gpu 0, the ouput becoming then just the first line with the 5773MiB being used.
It’s not currently a problem but I would like to understand why this is so, and if that may be a sign of me doing something wrong.
I am not working with DataParallel or multiprocessing, these are just three separate models on separate processes and normally separate gpus.
Thank you in advance for your answers and indications.
Have a good day ! :smile:

Hi, so I think what’s most likely happening here is somewhere in your processes meant for gpu 1 and 2 you are calling cuda without explicity assigning to your desired gpu so it goes to your default gpu device 0. Or more likely come to think of it I bet you may be saving the model in process without assigning to desired gpu. Though would be easier to pinpoint if you provided some code to look at. :slight_smile:

Thank you for the answer.
In terms of assigning to the gpu I have a check after the creation of the model to send it to the device

    cuda = torch.cuda.is_available()
    if cuda and gpu_to_use > -1:
        use_cuda = True
        device = torch.device("cuda", gpu_to_use)
        print("Using CUDA.")
        use_cuda = False
        device = torch.device("cpu")
        print("Using CPU.")  
  net =

Once I’ve done this, in the training and testing function, I give the device and have basically something like this (in the iteration over batches):

out = net(sample_batched['tensor'].to(device).view(batch_size, -1, max_size * 20))
target = sample_batched['interaction'].to(device)
loss = criterion(out, target)

I forgot to say that I am currently working on pytorch 0.4.0, I don’t know if that is of any help.

So device is a contextmanager class so best practice is to use with “with” statement. So for example your line here:


You are just entering and exiting so effectively performs nothing for you.
You want to use as so:

with torch.cuda.device(gpu_to_use):
    net = model.cuda()

That way any calls to cuda will go to desired gpu as long as in with-block.
Could you possibly provide a link to full script cause can’t see whats exactly going wrong though with what you provided.

Thank you for that information.
I can’t provide a link to the full script because it is quite long, but here’s a partial one of the call to the training of the model.

This gist has two revisions, the first one without the modifications to take into account what you said about torch.cuda.device, and the second, current one the modification with the
with torch.cuda.device(gpu_to_use):
but I am not entirely sure it should not be elsewhere.