Very Slow training on RTX 2070 Max Q

Hey, I am trying to train CNN from scratch with 4 convolution layers. I am using dataset from kaggle and code is hosted on git hub : https://github.com/vishal-purohit/Histopathological-Image-Classification

Following are the specification of my laptop:
-> RAM : 16 GB
-> Intel i7 - 9th gen (4.0GHz)
-> Nvidia RTX 2070 Max Q (8GB)
-> Windows 10
-> pytorch version 1.5

RAM usage : 11.7 GB
CPU Usage : 14% - 25 %
GPU Usage : 0 - 6 %
GPU clock speed : 885 MHz
Batch_Size : 2048
GPU_Memory used : 6.7GB / 8GB

The GPU utilization is very low mostly it is 0% and fluctuates between 6%. What might be the bottle neck in this case?

one epoch takes approximately 22 minutes

I have not looked extremely closely at your code, but two things I noticed that are missing, which might have some impact on performance, is to optimize for better CPU and GPU operation interleaving by using asynchronous data transfers. This might not have a huge impact in your case, but since it is quick and easy to implement, I recommend giving it a shot.

First, on your data loaders, add the keyword argument pin_memory=True. Second, in your train/eval loop where you copy data from CPU memory to GPU memory using .to(...), append the keyword argument non_blocking=True.

Specific examples of these changes to your notebook:

# DataLoader with pin_memory=True
train_data_loader = DataLoader(train_dataset, batch_size=2048, shuffle=True, num_workers=0, pin_memory=True)

# Tensor.to(...) with non_blocking=True
xb = xb.to(device, non_blocking=True)

How much impact this will have in your specific case is really hard to determine, since I cannot see how much time the GPU spends waiting on data transfer, etc. But again, making these changes is quick, so I recommend testing to see if it speeds things up. If the impact is minimal, we will need to see more details related to GPU utilization; e.g. output from nvprof or similar.

No change in the training speed. Still GPU utilization is 0% -2% and sometime 6%