Testing data taking over 5hr and no output but Training and Validation was super fast

I would like to know if anyone knows the reason why my training and validation occurs quickly but then my testing (where I print out the f1 score and classification report ) takes over 5 hours and then times out on google colab.

Any guidance would be useful

You should check to see if you’re using a GPU in your test code

The colab book runs on gpu but I am unsure on how to check that the testing dataset is running on colab

predlist=torch.zeros(0,dtype=torch.long, device='cpu')

    lbllist=torch.zeros(0,dtype=torch.long, device='cpu')

    with torch.no_grad():



        for test_batch_data,test_batch_labels in allTheDataloaders["Testing"]:

            test_batch_data, test_batch_labels = test_batch_data.to(device), test_batch_labels.to(device)


            y_scores, y_pred_targets = torch.max(y_test_pred, dim = 1)






    print(classification_report(lbllist.numpy(), predlist.numpy()))

    data=confusion_matrix(lbllist.numpy(), predlist.numpy())

    data_cm=pd.DataFrame(data, columns=np.unique(lbllist.numpy()), index = np.unique(lbllist.numpy()))

    data_cm.index.name = 'Actual'

    data_cm.columns.name = 'Predicted'

    plt.figure(figsize = (10,7))

    sn.set(font_scale=1.4)#for label size

    sn.heatmap(data_cm, cmap="Blues", annot=True,annot_kws={"size": 16},fmt=".1f")

it only takes long for a specific dataset.


Can you print(device) ?

@omarfoq it prints

Well in that case you are running on GPU, can you check that the batch size for test is not very small, also the operation you do to stack predictions is not “optimal”, because you keep copying small chunks of information from GPU to CPU, probably it’s better to keep all prediction on GPU, and at the end send everything to CPU.

@omarfoq do you have any examples that I can refer to. I am new to pytorch and I have been using tutorials to implement.

make sure you call model.cuda() or model.to(device)so the model will run on GPU.
Instead of writing predlist=torch.cat([predlist,y_pred_targets.view(-1).cpu()]) you should keep all your tensors on GPU so just keep predlist=torch.cat([predlist,y_pred_targets.view(-1)]) and predlist=torch.zeros(0,dtype=torch.long, device=device) where device refers to your GPU. Then at the end you just make predlist=predlist.cpu() if your code require numpy arrays.

@cyRi-Le Hi,
Thank you.
I am trying this now and hopefully it doesn’t take too long.
It is perfectly fine for my other data loaders, it’s just one specific one that it seems to have an issue with.