Network Only Predicts the First Image Successfuly but not the rest

I’m testing my network with a back of images (following this tutorial). I’m getting the first 5 images of the batch and printing them as well as their truth labels:

# Showing the images
test_images, test_labels = next(iter(testloader))
fig = plt.figure(figsize=(15,4))

for i in range(5):
    
    ax = fig.add_subplot(1, 5, i + 1)
    plt.imshow(images[i].permute(1, 2, 0))
    plt.axis("off")

print('Truth Answer: ', ' '.join('%5s' % label_guide[labels[j]] for j in range(5)))

After, I load the model I created and pass the testing images into it for a prediction!

vehicle_classifier = Test_Network()
vehicle_classifier.load_state_dict(torch.load(PATH))
outputs = vehicle_classifier(test_images)
_, prediction = torch.max(outputs, 1)
print('Prediction: ', ' '.join(label_guide[prediction[i].item()] for i in range(5)))

The network seems to do fine identifying the first image (which in this case is an airplane). However, when printing the predictions for the other 4 images, it labels them all as Airplane as well. I also checked what the prediction tensor was and it labeled everything as airplane. I attached those outputs below:

Prediction:  Airplane Airplane Airplane Airplane Airplane
tensor([0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0])

Does anyone have an idea on why it is doing this? On the tutorial, the network had a different prediction for each image but mine seems to label all of them as the first one ):

It looks like you are testing images on a different dataset than the standard CIFAR10 dataset. (if my eyes don’t deceive me, these look to be much higher resolution than 32x32)

What size are the test images being resized to? If they are 224x224 while the model was trained on 32x32, we would expect model performance to be poor due to the scale difference in the training and testing distributions. If you’re checking the resizing in the testing dataloader, it might also be a good time to check that all of the normalizations (e.g., scaling color channels) match up with done at training time as well.