Am getting error trying to predict on a single image CNN pytorch

Traceback (most recent call last): File “pred.py”, line 134, in output = model(data) Runtime Error: Expected 4-dimensional input for 4-dimensional weight [16, 3, 3, 3], but got 3-dimensional input of size [1, 32, 32] instead.
Also changed the dimension of the imge to something like this input_var = input_var.view(1, 1, 32,32) but got this error Traceback (most recent call last): File “pred.py”, line 135, in data = data.view(1, 3, 32,32) RuntimeError: shape ‘[1, 3, 32, 32]’ is invalid for input of size 1024 .

Prediction code

normalize = transforms.Normalize(mean=[0.4914, 0.4824, 0.4467],
                                     std=[0.2471, 0.2435, 0.2616])
train_set = transforms.Compose([
                                 transforms.RandomCrop(32, padding=4),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(),
                                 normalize,
                                     ])

model = models.condensenet(args)
model = nn.DataParallel(model)
PATH = "results/savedir/save_models/checkpoint_001.pth.tar"

model.load_state_dict(torch.load(PATH)['state_dict'])


device = torch.device("cpu")

model.eval()

image = Image.open("horse.jpg")
input = train_set(image)
train_loader = torch.utils.data.DataLoader(
        input,
        batch_size=1,shuffle=True, num_workers=1)
for i, data in enumerate(train_loader):
    
    #input_var = torch.autograd.Variable(data, volatile=True)
    #input_var = input_var.view(1, 3, 32,32)
    
    **output = model(data)
topk=(1,5)
maxk = max(topk)

_, pred = output.topk(maxk, 1, True, True)

Instead of doing the for loop and train_loader why don’t you just pass the input directly into the model. like this

input = train_set(image)
input = input.unsqueeze(0)
model.eval()
output = model(input)

Thank you, it works, but the model was predicting at random, which shouldn’t be,

prediction = int(torch.max(output.data, 1)[0].cpu().numpy())

In the training module, dataloader was used, could this be the reason
Link to the training module

No this shouldn’t change anything. In the transformations you should delete horizontal flip and do a resize instead of random crop.

1 Like

Thank you, have deleted horizontal flip and random crop, RuntimeError: mat1 dim 1 must match mat2 dim 0
Is there a way to resize the image to aviod this err, please

Yes you can use transforms.Resize((32,32))

I used this

 transforms.Resize((32, 32))

The model prediction has improved so well, am really grateful for your help

1 Like

Noticed this: horse.jpg-prediction might be 8, if I run the code again, the prediction is still same, which is good, but if I tried using another picture of a horse, horse1.jpg the prediction changes to another number, have used other images too, it was same, I guess I still have somethings to do on image transformation?

It may just be because your model is overfitting. For training do you use the random crop?

Yes, random crop was used

Ok what dataset are you using to train it?

Am using CIFAR10 dataset

Ok maybe instead of using random crop in your training use resize. That might help.

Okay, I will try this out, thank you

Have tried using resize instead of random crop in training, but no improvement

Hmm ok. It may just be that cifar10 does not have any horses with that color so the model doesn’t recognize them.

Have used multiple images, the prediction on similar objects are different

Ya that is what I meant. Cifar might not have horses that have the same color as the test images you passed it so it might not be able to recognize it. Did the model guess correctly for any of them.

Yes, it guess correctly for one brown horse, but I tried using more brown horse the prediction was different, same thing for other classes

I rerun the training module, and made changes to this line

correct_k = correct[:k].view(-1).float().sum(0)

on getting this error: RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.

Solves it by calling .contiguous() before view

,
also in the prediction code I changed this

prediction = int(torch.max(output.data, 1)[0].cpu().numpy())

to this

topk=(1,5)
maxk = max(topk)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
print(pred)

taking the first value of the tensor as my prediction, have tried it on ten images of different classes which I got 8 right predictions. Happy to share this with you, thank you very much Dwight_Foster