Using MViT as an encoder

Hello all,

I am trying to use video.mvit_v2_s as an encoder to get the encodings of frames. more specifically, I want to omit the “head” from the last block of this model. Is there any easy way to do it?
I am trying to create a class and inherit from MViT class. but get errors. can anyone help me with this?

Thanks in advance

Probably the least amount of code would be defining a forward module and replacing the head with that:

class Fwd(nn.Module): 
    def forward(self, x):

Then redefine the head:

model.head = Fwd()

Thank you for your answer. Is this the same as Identity function? meaning that I need to replace the previous classification head with an Identity func?

No. It’s not. The identity matrix takes up ram and has some calculation overhead.

The above Fwd function does basically nothing. No memory increase during run time and no compute.

Yes, just use nn.Identity()