Integrate original image shape information in CNN for image classification

Hi everyone,

I have a question about one of my project and I hope you can help me to solve it :smiley: . I have to classify images across 15 classes and I get the image from a detector, then I use ResNet18 to classify them.

But I know that the original shape of the detected object (the cropped object from the detector) give me a lot of information about how should this cropped image should be classified.

Since I have to resize all image to (224x224) for ResNet18, Iā€™m losing the shape information. So My question is, is it possible to add this information in the CNN ? How can I do it in Pytorch ? Is it a common practice in the Computer Vision Area ?

Thank you all

Since resnet18 uses an adaptive pooling layer before the flatten operation, the input shapes are flexible.
However, this might of course degrade your performance and the architecture might not make sense, if your inputs are too small.

Your use case reminds me a bit of CoordConv by Uber, which provided a coordinate system to their inputs. Would something like this make sense for your use case?

1 Like

What you want to do with this information?

This is an very interesting work, thanks for sharing @ptrblck . So if I understand, one of the solution could be to add an additional channel to my image (and one more to the cnn input).
But since the coordinates I want to add is 1x4 and images 224x224, do you think I have to repeat these coordinates to reach a shape of 224x224 and stack them to the current image ?

Hi @ebarsoum, with these additional data, I want to give the cnn more ability to classify images, because they are very difficult to dissociate as a human, but juste the original shape can sometimes help me classify them.

i.e : I know that one of my category always have rectangular shape.

But I cannot base decision only on the shape, that why Iā€™m using CNN.