Feeding single channel images to three channel pre-trained model

Problem

I am trying to use ResNet pretrained model provided in torchvision.models, which is pre-trained on ImageNet where images are RGB. However, my dataset consists of CT scan images, which are grayscale. I am wondering how to feed my data into this model to fine-tune the parameters.

Three are several thoughts

  • Fill the other two channels with all 0’s or 1’s.
  • Copy the single channel images three times.

I tried to do those with MNIST dataset and got the following, where the first one corresponds to copying three times while the other two correspond to padding with all 1 and padding with all 0, respectively.

From this, it seems that copying the single channel three times make more sense since they look essentially the same as original image. But I am not sure someone encountered this problem and found a empirically reasonable solution.

Thank you in advance!

It’s difficult as you are applying a human semantic perspective assuming that cloning channels “looks the same” It’s better than setting random numbers but I don’t think it will provide good results. You should finetune the network

1 Like

I tried two methods for MNIST images generated by copying them 3 times.

  • Just change the output layers but freeze all previous layers
  • Change the output layer and fine-tune all other layers

Additional fine-tuning gives an about 4% accuracy boost, from around 95% to 99% for individual category.