Image shape inconsistency

abdualhag · January 3, 2019, 3:43am

Hi,

I used the following tutorial, Finetuning, to train a model on my image set and I was successful. When I tried to load and test the model, I keep getting odd errors with some images but not all which does not make sense because the error is related to input size which should be the same across all images after image processing.

Below is the code I used for classification:

model = torch.load("./resnet.pt")
model.eval()
model.cpu()


def image_loader(loader, image_name):
    image = Image.open(image_name)
    image = loader(image)
    image = image.unsqueeze(0)
    image = Variable(image)
    return image

input_size = 224
data_transforms = transforms.Compose([
    transforms.Resize(input_size),
    transforms.CenterCrop(input_size),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

prediction = model(image_loader(data_transforms, '4Isb3u5.png')) 
prediction = prediction.data.numpy().argmax()
print (prediction)

The error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-f20545e3ab97> in <module>
     25 # Now let's load our model and get a prediciton!
     26 vgg = models.vgg16(pretrained=True)  # This may take a few minutes.
---> 27 prediction = vgg(img)  # Returns a Tensor of shape (batch, num class labels)
     28 prediction = prediction.data.numpy().argmax()  # Our prediction will be the index of the class label with the largest value.
     29 print (labels[prediction])  # Converts the index to a string using our labels dict

~\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~\Anaconda3\envs\pytorch\lib\site-packages\torchvision\models\vgg.py in forward(self, x)
     40 
     41     def forward(self, x):
---> 42         x = self.features(x)
     43         x = x.view(x.size(0), -1)
     44         x = self.classifier(x)

~\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

~\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\conv.py in forward(self, input)
    318     def forward(self, input):
    319         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 320                         self.padding, self.dilation, self.groups)
    321 
    322 

RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[1, 1, 224, 382] to have 3 channels, but got 1 channels instead

The working image:

The image that causes error:
Link

Any help is appreciated. Thanks

vmirly1 · January 3, 2019, 3:51am

It seems that the image that is causing this error might be a gray-scaled image with only one channel. Can you verify that verify the shape of the resulting tensor for that image before feeding it to the model?

abdualhag · January 3, 2019, 4:21am

I am not sure if that is the case but it is worth pointing out that the image that causing the error is one of the images I used for training the model.

As for your question, the shape as printed by the error message is [64, 3, 3, 3]. If there is a more specific info you need, it would help if you could point out the line of the code that print that info.

Thanks,

vmirly1 · January 3, 2019, 4:33am

No, this shape [64, 3, 3, 3] is not for the input, but rather this the shape of the kernel (weights) of the convolution layer, and means this convolution layer has 64 output channels, 3 input channels and kernel size of 3x3.

abdualhag · January 3, 2019, 4:47am

Below are some information that are related to the image. I am not sure if there is a handy function that returns the shape of the resulting tensor.

print (img.info) #various information about the image.
print(img.size) #size of the image
print(img.getbands()) #different channel present in the data
print(len(img.split())) # num_channel

Output:

{'gamma': 0.45455, 'chromaticity': (0.3127, 0.329, 0.64, 0.33, 0.3, 0.6, 0.15, 0.06)}
(1196, 700)
('P',)
1

vmirly1 · January 3, 2019, 5:05am

So, you can see this image has only 1 channel, but the convolution layer expects 3 channels.

abdualhag · January 3, 2019, 5:28am

Thank you so much

I have used the following to fix the issue:

rgb_im = img.convert('RGB')
rgb_im.save('audacious.jpg')

I still cannot make sense of why the error did not rise during training.