def forward(self, x, labels=None):
x = self.model(x)
x = self.dense(x)
x = self.sigmoid(x)
if labels != None:
labels = torch.reshape(labels, (x.shape, 1))*1.0
loss = torch.nn.BCEWithLogitsLoss()(x, labels)
return x, loss
After this i run this piece of code to do some random testing and i get this error…!!
model = Model().to(“cpu”)
for i in range(1,100):
y = model(torch.ones(2,3,i,224,224).to(“cpu”))
except Exception as error:
ERROR - shape ‘[2, 96, 8, 56, 56]’ is invalid for input of size 602112
mvit_v2_s seems to be a torchvision.models.video model described here which expects inputs in:
Accepts batched (B, T, C, H, W) and single (T, C, H, W) video frame torch.Tensor objects. The frames are resized to resize_size= using interpolation=InterpolationMode.BILINEAR, followed by a central crop of crop_size=[224, 224]. Finally the values are first rescaled to [0.0, 1.0] and then normalized using mean=[0.45, 0.45, 0.45] and std=[0.225, 0.225, 0.225]. Finally the output dimensions are permuted to (..., C, T, H, W) tensors.
This model can be used for video recognition and looking at the input (2 , 3 , i , 224, 224) . i here is the frame length and according to your answer 16 is the clip_len…so can i use another value for clip_len other than 16???
so can i use another value for clip_len other than 16???
Nope. This is hardcoded here. You can open an issue if you need other clip lengths and thus being able to set this value in the builder of the model. Note however that this will not change the fact that our weights are trained for 16 frames and that won’t change. Meaning, you will have to train yourself.
For completion, here is full example on how to use the model:
from torchvision import models
name = "MViT_V2_S"
builder = models.get_model_builder(name)
weights = models.get_model_weights(name).DEFAULT
model = builder(weights=weights)
transform = weights.transforms()
input = torch.ones(2, 16, 3, 224, 224)
result = model(transform(input))