No GPU utilization although CUDA seems to be activated

What do you mean by get no GPU usage? If you are referring to the GPU memory usage of a separate process, then it is clearly written there: Not available in WDDM driver model. However, the mem usage for the whole GPU should be available through nvidia-smi.

I’m running my pytorch code (you can take a look at the original post) and I get almost no GPU usage (~1%) when looking in the task manager (also attached above).
All the CUDA signs from python seems to be ok so I’m not sure what’s going on here…

BTW when running “nvidia-smi -l” I see “N/A” in “GPU Memory Usage” of the python process

@imesery The graph in Task Manager is confusing. Actually, it doesn’t show anything about gpu mem usage. And the python output is correct because when you call it, you didn’t store any tensors on GPU.

You can simply execute the following piece of code to verify whether it’s working.

>>> import torch
>>> a = torch.cuda.FloatTensor(10000)
>>> print("Allocated:", round(torch.cuda.memory_allocated(0)/10243,1), "GB")
Allocated: 3.9 GB
>>> b = torch.cuda.FloatTensor(20000)
>>> print("Allocated:", round(torch.cuda.memory_allocated(0)/10243,1), "GB")
Allocated: 11.8 GB

but I do store tensors on GPU, when printing the tensors it says they’re on “device=cuda0”.
attaching another screenshot of task manager during the python run when GPU graph is enlarged.
the CPU usage went up to ~55% when started running but the GPU usage is at 0.

import torch

a = torch.cuda.FloatTensor(10000)
print(“Allocated:”, round(torch.cuda.memory_allocated(0)/10243,1), “GB”)

b = torch.cuda.FloatTensor(20000)
print(“Allocated:”, round(torch.cuda.memory_allocated(0)/10243,1), “GB”)

Allocated: 22595.4 GB
Allocated: 22595.4 GB

Yes, it’s because it’s not large enough.
Try this one:

torch.rand(20000,20000).cuda()

It will allocate nearly 2GB.
P.S. print(“Allocated:”, round(torch.cuda.memory_allocated(0)/10243,1), “GB”) this clause is not correct and I just copied it from your code to say that it’s not zero.

yes I fixed the clause to be 1024**3.

torch.rand(20000,20000).cuda()
print(“Allocated:”, round(torch.cuda.memory_allocated(0)/1024**3,1), “GB”)

Allocated: 0.2 GB

You’ll need to save it into a variable, otherwise it will be released.

when trying to add this print inside my train_model function (after each data fetch) I get 0.5GB, added it after this:
inputs = inputs.to(device)
labels = labels.to(device)

I don’t know how large your data is so I don’t know whether it’s correct. But I think since it’s increasing, it should have no much trouble.

ran this again:
c = torch.rand(20000,20000).cuda()
print(“Allocated:”, round(torch.cuda.memory_allocated(0)/1024**3,1), “GB”)

and I get 1.5GB

@imesery: referring to how big the data is - I now increased the batch size to 128 (224x224x3 images), and I see GPU memory usage, but still utilization of only ~1% (besides when initializing the model there was a ~1sec peak). is that normal? shouldn’t I use the GPU compute resources to actually do the model calculations? is it possible that 1% is simply what it needs for that?

https://devblogs.microsoft.com/directx/gpus-in-the-task-manager/
It is normal, see this post for more details.

1 Like
>>> import torch
>>> a = torch.rand(20000,20000).cuda()
>>> while True:
...     a += 1
...     a -= 1

By running this code, I can see 100% utilization in nvidia-smi but nearly 1% for Task Manager. So it should be okay.

1 Like

I see… thank you very much! :slight_smile:

Hi there mate! I notice that in the screenie you are clearly not looking for CUDA utilization! Please note the tiny arrow pointing down next to 3D (and Copy, Video Encode & Decode). You might want to pick CUDA if you wish to see CUDA activity clicking on the tiny arrow and then looking somewhere near the bottom of the list. I can assure you that if you train a deep network you will in fact be see CUDA activity spikes!

1 Like

Based on what I read here, there is a simple solution for this problem. As you can see in task manager, there are 4 options in gpu page:


3D, Copy, Video Encode, Video Decode.
Just click of one of them(i.e. Copy) and change it to Cuda.

That’s all.

7 Likes

Did you get python to use the cuda from the GPU nvidea 980 ?