Hi,
I’m trying to replace mc3_18 with mvit_v2_s from torchvision.models.video but getting tensor shape error. Here is my code:
import torch
import torch.nn as nn
from torchvision import models
# base = models.video.mc3_18(weights='DEFAULT', progress=True)
base = models.video.mvit_v2_s(weights='DEFAULT', progress=True)
base = nn.Sequential(*list(base.children())[:-1])
batch_size = 4 # Adjust as needed
dummy_input = torch.randn(batch_size, 3, 16, 224, 224)
output = base(dummy_input)
print("Model input shape:", dummy_input.shape)
print("Model output shape:", output.shape)
Without applying nn.Sequential on all the layers except the last, mvit_v2_s model works. Ideally it should work as mc3_18 is working with nn.Sequential and give the out with dim [4, 512, 1, 1, 1]
. Using nn.Sequential with mvit_v2_s gives this error that I’m unable to debug.
def forward(self, x: torch.Tensor) -> torch.Tensor:
411 class_token = self.class_token.expand(x.size(0), -1).unsqueeze(1)
--> 412 x = torch.cat((class_token, x), dim=1)
413
414 if self.spatial_pos is not None and self.temporal_pos is not None and self.class_pos is not None:
RuntimeError: Tensors must have same number of dimensions: got 3 and 5
Is there any other way to use mvit_v2_s for features extraction?