My model is is not using GPU after successful conversion though

My model is converted into GPU, cuda device is active (cuda.is_available() returns True), the model is checked if it is converted to cuda (next(model.parameters()).is_cuda returns True). The data X and Y in training function is also on GPU and has been confirmed. But still when I run the code, my model runs slow, GPU utilization is only 4 to 5% and the CPU usage becomes 100% for some instant of time.
Please guide me what should I double check.

def main():

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
print('__Number of CUDA Devices:', cuda.device_count(), ', active:', cuda.current_device())
print ('Device name: .... ', cuda.get_device_name(cuda.current_device()), ', available >', cuda.is_available())

model = BaseNetwork.TestModel()
model = nn.DataParallel(model, device_ids=[0])
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
cudnn.benchmark = True'cuda')
summary(model, (3, 236,236))

base_lr = 0.0001
epochs = 200
workers = 0
momentum = 0.9
weight_decay = 1e-3
best_prec1 = 1e20
k = 0

optimizerr = torch.optim.Adam(model.parameters(), lr=base_lr, weight_decay=weight_decay, betas=(0.9, 0.95))
criterion = nn.MSELoss().cuda()


dataset_path = r'D:\My Research\Video Summarization\VS via Saliency\SIP'
d_type = ['Train', 'Test']
train_data = DatasetLoader(dataset_path, d_type[0])
train_loader = DataLoader(train_data, 4, shuffle=True, num_workers=2, pin_memory=True, drop_last=True)

test_data = DatasetLoader(dataset_path, d_type[1])
test_loader = DataLoader(test_data, 4, shuffle=False, num_workers=2, pin_memory=True, drop_last=True)

for epoch in range(0, epochs):

    train(model, optimizerr, criterion, train_loader)
    print("Epoch: %d, of epochs: %d"%(epoch,epochs)), '')

def train(model, opt, crit, train_loader):
for i, (X, Y) in enumerate(train_loader):
X =‘cuda’)
#print('X in train model is on GPU: ', X.is_cuda)
Y =‘cuda’)
#print('Y in train model is on GPU: ', Y.is_cuda)

    output = model(X)

    loss = crit(output, Y)


Your training might suffer from e.g. a data loading bottleneck (or any other bottleneck, which starves the GPU). Have a look at e.g. this post for more information about the data loading and try to profile your code to check, which is the slowest part.

1 Like

Thanks a lot, @ptrblck, this comment from your link worked for me…

  • Don’t leave the dataloader pin_memory=‘True’ on by default in your code. There was a reason why PyTorch authors left it as False. I’ve run into many situations where True definitely does cause extremely negative paging/memory subsystem impact . Try both.