Hello guys, I have been looking around in the discussion forums on how to feed a grayscale image to imagenet models. I wanted to know that, is it possible to simply use a convolution with padding = 1 so as to preserve its dimension and then feed it to the model. Will there be any side effects of it?
It is certainly possible to use a conv layer in order to transform the grayscale input images to images containing 3 channels. To keep the spatial dimension you could use e.g. a
1x1 kernel or any other setup with the appropriate padding (since you’ve mentioned a padding value of 1 I assume you would like to use a
3x3 kernel with
“Side effects” would be that you are adding more trainable parameters at the beginning of the model. Assuming you would like to train this layer, Autograd would need to backpropagate through the entire model, which could be avoided in case you would freeze the pretrained model and only train the classifier.
Thank you, I don’t think I want to train that layer so I should freeze it.