Video Feature extraction using ResNet3D pre-trained model

Dear all,
i’m new in Pytorch and i need to use ResNet 3D pre-trained model for video classification,
in Tensorflow it’s just remove classify layer and create new head with custom classes and train the model.
someone have an idea or tutoriels how to do this with Pytorch?
thanks for advance :slight_smile:

Hello,
you could do it as follows -

pretrained_model = torchvision.models.video.r3d_18(pretrained=True)
modules = list(pretrained_model.modules())

pretrained_sequential = nn.Sequential()

for _, module in enumerate(modules):
    pretrained_sequential.add_module('module' + str(_), module)

You can decide up to what layer you want to add into to your sequential block. For that you might want to see how ResNet3D is implemented in torchvision

i think it is necessary that remove last layer(classify) before add sequential (new head)?

When you use the .modueles() method, you get a list of all the modules present in the network, it is then up to you which ones you want to keep and which ones you don’t. You can check the implementation of the model or simply print the list to see what all is present.

Yes the last layer is a classification one and if you want to add another convolution block, you might have to remove it.

Otherwise it is not compulsory to remove it, for example, you could add another linear layer on top of the classification layer.