Training with GPU

Hello, i have RTX 2060 with Ryzen 5 2600x. I am trying to train my CNN with GPU however even though i use GPU, the training is very slow, almost same as CPU.

After some research i saw a post on stackoverflow mentioning that i can not use GPU if there is display attached to the it.

Is that true?

My nvidia-smi looks like this.

| NVIDIA-SMI 445.75       Driver Version: 445.75       CUDA Version: 11.0     |
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce RTX 2060   WDDM  | 00000000:1C:00.0  On |                  N/A |
| 22%   53C    P2    29W / 160W |   3866MiB /  6144MiB |      6%      Default |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A     21072      C   ..._torch\Scripts\python.exe    N/A      |

My network, images, labels are all converted to cuda. But it is very slow.

That is wrong. Your display output might use some memory on your device, but you should still be able to use the GPU for processing in PyTorch (at least that’s how I’m doing it in my Linux box and Laptop).

Could you post your training code so that we can have a look at it?


Heres my code:


class Network(nn.Module):

def __init__(self):


    #Img size 32x32        

    self.conv1 = nn.Conv2d(in_channels=1, out_channels=12, kernel_size=5)        

    self.conv2 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=5)        


    self.fc1 = nn.Linear(in_features=24*5*5, out_features=60)        

    self.out = nn.Linear(in_features=60, out_features=2)


def forward(self, tensor):

    tensor = F.relu(self.conv1(tensor))

    tensor = F.max_pool2d(tensor, kernel_size=2, stride=2)               

    tensor = F.relu(self.conv2(tensor))

    tensor = F.max_pool2d(tensor, kernel_size=2, stride=2)              

    tensor = tensor.flatten(start_dim=1)        

    tensor = F.relu(self.fc1(tensor))        

    tensor = self.out(tensor)

    return tensor


def findKey(value):

key = "".join([k for (k, v) in classes.items() if v == value])

return key

class CatDog(Dataset):

def __init__(self, csv_file, root_dir, transform=None):

    self.annotations = pd.read_csv(csv_file)

    self.root_dir = root_dir

    self.transform = transform

def __len__(self):

    return len(self.annotations)

def __getitem__(self, index):

    img_folder = findKey(self.annotations.iloc[index, 1])

    img_name = self.annotations.iloc[index, 0]

    img_path = os.path.join(self.root_dir, img_folder, img_name)

    image = io.imread(img_path)        

    label = int(self.annotations.iloc[index, 1])        

    if self.transform:

        image = self.transform(image)

    return image, label


device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)

def getNumCorrect(predictions, labels):

return predictions.argmax(dim=1).eq(labels).sum().item()

TRANSFORM = transforms.Compose([transforms.ToTensor()])

catdogdataset = CatDog(csv_file, root_folder, TRANSFORM)

train_loader = DataLoader(catdogdataset, batch_size=50, shuffle=True)

comment = f"batch_size=50 lr=0.01"

tb = SummaryWriter(comment=comment)

network = Network()

network =

optimizer = optim.Adam(network.parameters(), lr=0.01)

for epoch in range(20000):

total_loss = 0

total_correct = 0

for batch in tqdm(train_loader):

    images, labels = batch

    images, labels =,

    grid = torchvision.utils.make_grid(images)

    tb.add_image("images", grid)

    tb.add_graph(network, images)

    predictions = network(images)

    loss = F.cross_entropy(predictions, labels)




    total_loss += loss.item()

    total_correct += getNumCorrect(predictions, labels)

accuracy = total_correct / len(catdogdataset)


Images are 32x32 grayscale

Each epoch takes around 2 minutes 30 seconds with GPU and 2 minutes 45 seconds with CPU.

Is this normal? I am using Kaggle Cats and Dogs dataset.

Thanks for your help.

Ok i found the issue, it was this line bottlenecking the training.

tb.add_graph(network, images)

My training speed is good after getting rid of that line.