This seems to be working. I am verifying it by doing -

summary(model, (1, 224, 224))

This is returning the summary correctly.

However, what I am confused about is that currently, the weights of the first convolutional layer are initialised randomly. Is there a way I can initialise the first layer from the weights of the pretrained 3-channel pytorch model? One way to do it would be to take a mean of the weights from the pretrained model.

The shape of the weight matrix of the first conv layer for 3 channels is torch.Size([64, 3, 7, 7]) whereas for 1 channel it is torch.Size([64, 1, 7, 7]). Does taking the mean across the second dimension make sense?

Hi @Adit_Whorra,
I think you can explore some solutions:

Adapt input. You can convert the 1 channel image to 3 channels one (just replicating the channel).

Adapt first model layer. The solution you suggested. To initialize your first layer, you can try to use the mean or the sum over channels of the 3-channel pre-trained weights.

Evaluate both solutions on ImageNet dataset, then you decide.

The first solution will unnecessarily add extra parameters so I think the second option makes more sense. Will the resultant weights still retain what the model has learnt from pre-training though after taking the mean or sum?