Predictions from DataLoader: RuntimeError if using cpu(), TypeError if using cuda()

I’m learning PyTorch basics. I have trained a ResNet50-based model successfully in a Jupyter Notebook using cuda. It achieves around 85% accuracy, and I’d like to explore whether there are patterns to which classes it struggles with. Adapting code from a Microsoft Azure tutorial to obtain the predictions and labels so that I can go on to make a confusion matrix to begin the exploration, I have

#Pytorch doesn't have a built-in confusion matrix metric, so we'll use SciKit-Learn
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
%matplotlib inline

# Set the model to evaluate mode

# Get predictions for the test data and convert to numpy arrays for use with SciKit-Learn
print("Getting predictions from test set...")
truelabels = []
predictions = []
#probabilities = []
for data, target in test_loader:
    for label in target.cpu().data.numpy():
    for prediction in model.cpu()(data).data.numpy().argmax(1):

The problem I’m encountering is that when I use target.cpu() and model.cpu() in the above code, it starts running very slowly then throws the following error:

RuntimeError: [enforce fail at …\c10\core\CPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 62914560 bytes.

Or crashes the Google Chrome window that the Jupyter Notebook is open on.

I trained the model using cuda, but if I switch to target.cuda() and model.cuda() it throws the following error:

TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Since I adapted this from a tutorial that used a much smaller dataset as a proof of concept, I’m not sure if this is even a proper way to explore the results. Can someone please advise me on best practice in this regard, or recommend a workaround for the issues I’m encountering?

Update: I used the new pre-trained vit-b-16 model and did not get the RuntimeError from the posted code snippet when using that model. Still interested in using cuda() if possible to speed up the process and hearing if there is a better way to accomplish my objective.

The RuntimeError in the CPUAllocator is raised if you are running out of host RAM and you would need to reduce the memory usage e.g. by decreasing the batch size of the training.

The TypeError is raised as you are trying to convert a CUDATensor to a numpy array, which doesn’t work without pushing the tensor to the CPU first as described in the error message.

I tried pushing the tensor to the CPU and it wouldn’t work, but then I found that for whatever reason, the data from the loader had different Tensor types for the input and weights. After explicitly sending the data to cuda before prediction then explicitly pushing it back to the cpu, it now works as expected. Thanks for the help.