My GPU is not being utilized while using Pytorch 1.7?

I have 16gb ram, MSI 1050ti graphics card, intel 4770 processor.

though I have already installed

  1. CUDA=11.0
  2. CUDNN=8.0.5
  3. PYTORCH= 1.7
  4. Windows=10
import torch
import sys
print('__Python VERSION:', sys.version)
print('__pyTorch VERSION:', torch.__version__)
print('__CUDA VERSION', )
from subprocess import call
# call(["nvcc", "--version"]) does not work
! nvcc --version
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
# call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,,memory.used,"])
print('Active CUDA Device: GPU', torch.cuda.current_device())
print ('Available devices ', torch.cuda.device_count())
print ('Current cuda device ', torch.cuda.current_device())

__Python VERSION: 3.8.3 (default, Jul  2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
__pyTorch VERSION: 1.7.1
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:48_Pacific_Daylight_Time_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.relgpu_drvr445TC445_37.28540450_0
__Number CUDA Devices: 1
Active CUDA Device: GPU 0
Available devices  1
Current cuda device  0

The code I’m running is:

device = torch.device('cuda')

criterion = nn.NLLLoss()
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)
epochs = 1
steps = 0
running_loss = 0
print_every = 1

for epoch in range(epochs):
    for inputs, labels in trainloader:
        steps += 1
        # Move input and label tensors to the default device
        inputs, labels =,


        logps = model.forward(
        loss = criterion(logps, labels)

        running_loss += loss.item()

    if steps % print_every == 0:
        test_loss = 0
        accuracy = 0
        with torch.no_grad():
            for inputs, labels in testloader:
                inputs, labels =,
                logps = model.forward(inputs).to(device)
                batch_loss = criterion(logps, labels)

                test_loss += batch_loss.item()

                # Calculate accuracy
                ps = torch.exp(logps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor)).item()

        print(f"Device = {device};"
              f"Epoch {epoch+1}/{epochs}.. "
              f"Train loss: {running_loss/print_every:.3f}.. "
              f"Test loss: {test_loss/len(testloader):.3f}.. "
              f"Test accuracy: {accuracy/len(testloader):.3f}")
        running_loss = 0

the stats when i run my nvidia-smi and task manger:

I think your gpu is utilized. Have you started training and the gpu has not been utilized. Did you check the nvidia-smi while training?

Thank you @user_123454321. I have edited the description. Could you read it back please.

I think the last process in the list is the one for training. So, I guess the GPU is being utilized but the utilization is low.

What should I do to utilize it fast?

Can you check if the loading is the bottleneck by giving random inputs to the model for training?

Thanks @user_123454321. I followed these steps in this video listed below and now my GPU is getting utilized well.