Transfer Learning using a PreTrained Model

Hi,
I am implementing transfer learning using a pretrained weights. I am doing only feature extracting and not fine tuning the entire model. I have initialized the linear layer on a ResNet50 pretrained model.
I have video inputs .
I get the following runtime error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 5-dimensional input of size [30, 60, 3, 224, 224] instead.

(batch size is 30, sample duration(temporal duration of inputs) is 60)

Can anyone pls help me resolve the issue?

Since you are also changing the input shape (from the original 4-dimensional inputs to 5-dimensional ones), you would need to change at least the first conv layer (e.g. by replacing it with nn.Conv3d, which accepts 5-dimensional inputs). However, I’ve mentioned “at least” since the output of this new conv layer would also be 5-dimensional, so if you want to keep this activation shape you would need to replace the entire feature extractor.
I’m not familiar with your use case, but maybe iterating the input could also work, i.e. indexing the input in the temporal dimension, passing each slice to the model, and concatenating it afterwards before feeding it to the classifier.

Thanks ptrblck. It helped.