Python says cuda is enabled and being used, but my system monitor shows otherwise

I have a fresh installation of PyTorch on my windows 10 laptop with all the necessaries for cuda usage. I am a new user on the forum so I can’t post images, but here is a link to imgur showing that the cuda versions match.

But when i execute some training my gpu isnt’ being used at all by python. Even though i don’t thin the problem is on the code, because running the quickstart code didn’t fixes the issue, here is my learning loop I used the quickstart to create it:

epocas = 10
for epoca in range(epocas):
    for batch, (X, y) in enumerate(dataloader):
        X, y =,
        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation

        if batch % 50 == 0:
            teste_batch(dataloader, model, loss_fn)

How did you check that the GPU is not used?
Note that the Windows task manager does not show the Compute tab by default and you might need to enable it. Alternatively, use nvidia-smi instead.

The compute tab is enabled, that was how i checked. This other imgur show python process in the Windows task manager while the train was happening and the nvidia-smi output.

Note: it is my first time using cuda, so i didn’t knew this nvidia-smi tool and don’t know how to read it.

Edit: I just realized that nvidia-smi displayed cuda version: 12.0 but nvcc displayed 11.7, is it possible that i have two cuda versions installed? This is weird because it is a recent windows installation and I am the only user.

The CUDA version in nvidia-smi corresponds to the driver and your PyTorch binary with CUDA 11.7 should work correctly.
The output also shows a GPU utilization of 56% and different processes using GPU memory.
The Python process would then correspond to your PyTorch script and the GPU is thus used.

1 Like

But still, the time execution of each epoch, now talking about the quickstart guide is, on average, the same when using cpu or gpu, about 10s per epoch. Shouldn’t it be faster on the gpu? For reference it is the mobile version of gtx1650 and the cpu is an intel 10300h.

And I still didin’t understand, why it shows 0% gpu usage and about 50% (almost two cores) on the python process on Windows task manager?

Edit: finally, the python process on the nvidia-smi shows no memory usage, shouldn’t it be using memory?

You could profile your code to see if any part of the training creates a bottleneck and if the GPU is e.g. “starving”. This could happen if your data loading is too slow and would result in short peaks of GPU util. during the training.

The CPU is still used to execute the Python code and for e.g. data processing (assuming you don’t use the GPU for it via e.g. DALI).

True, the guide uses the FashinMNIST dataset, so loading the images could take processor time. I did the same timing test on my model, using the iris dataset, that is loaded entirely before training so that would not be a bottleneck. I can see no other issue, as the training loop is really simple and the way it is written i suppose it should run entirely on the same device.

Running those timing tests again in this way, the cpu time is approximately .9s per epoch and gpu time is approximately .75, the difference still isn’t big, but is already something. Thank you for your help!