Is Pytorch not using GPU for training?

Hi there, I am working on a project called dog_app.py, within conda environment and a Windows 10 machine.

Although I have (apparently) configured everything to use GPU, its usage barely goes above 2%. I am moving the model to cuda(), as well as my data. Why GPU is not being used at all? How do I debug that?

use_cuda = torch.cuda.is_available()
model_scratch = Net()
if use_cuda:
    model_scratch.cuda()
    print("Let's use", torch.cuda.device_count(), "GPU(s)!")

# Prints "Let's use 1 GPU(s)!"

...
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
...
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            if use_cuda:
                data, target = data.cuda(), target.cuda()

I found this test on another thread on the subject, but allocating memory on the GPU worked just fine:

import torch
a = torch.cuda.FloatTensor(10000)
print("Allocated:", round(torch.cuda.memory_allocated(0)/10243,1), "GB")

b = torch.cuda.FloatTensor(20000)
print("Allocated:", round(torch.cuda.memory_allocated(0)/10243,1), "GB")

#Output:
# Allocated: 3.9 GB
# Allocated: 11.8 GB

Hi,

If you’re using windows, you need to be careful as CUDA computations are not reported in the task manager. You will have a to check with nvidia-smi from a command line I think.

Hi @albanD, thanks for your reply. It took me a while to figure out how to use the tool, but it seems I have only short bursts of usage. Is that how it is supposed to work?

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi --format=csv --query-gpu=utilization.gpu,fan.speed,temperature.gpu,power.draw -l 1
utilization.gpu [%], fan.speed [%], temperature.gpu, power.draw [W]
6 %, 0 %, 58, 50.59 W
1 %, 0 %, 59, 135.16 W
1 %, 0 %, 58, 50.50 W
51 %, 0 %, 58, 50.59 W
0 %, 0 %, 58, 144.52 W
0 %, 0 %, 58, 50.15 W
0 %, 0 %, 58, 50.25 W
59 %, 0 %, 59, 50.83 W
0 %, 0 %, 58, 136.92 W
0 %, 0 %, 58, 50.39 W
62 %, 0 %, 59, 50.83 W
0 %, 0 %, 59, 50.39 W
0 %, 0 %, 59, 50.59 W
0 %, 0 %, 61, 62.24 W
0 %, 0 %, 59, 50.49 W
0 %, 0 %, 59, 50.59 W
0 %, 0 %, 60, 50.83 W
0 %, 0 %, 59, 50.49 W
0 %, 0 %, 59, 50.39 W
0 %, 0 %, 60, 50.74 W
0 %, 0 %, 60, 50.83 W
36 %, 0 %, 61, 51.42 W
0 %, 0 %, 60, 50.74 W
0 %, 0 %, 60, 50.74 W 

It will depend a lot on your network. But if it is not too big or your dataloader is not fast enough then yes that is expected.
You can try adding workers to the dataloader to make sure this is not the bottleneck. Otherwise increasing the batch size (if you have enough memory) should increase the usage.