def forward(self, x, labels=None):
x = self.model(x)
x = self.dense(x)
x = self.sigmoid(x)
if labels != None:
labels = torch.reshape(labels, (x.shape[0], 1))*1.0
loss = torch.nn.BCEWithLogitsLoss()(x, labels)
return x, loss
return x
After this i run this piece of code to do some random testing and i get this error…!!
model = Model().to(“cpu”)
model.eval()
model(torch.ones(2,3,32,224,224).to(DEVICE))
model.eval()
for i in range(1,100):
try:
y = model(torch.ones(2,3,i,224,224).to(“cpu”))
print(“select”, i)
except Exception as error:
print(error)
pass
ERROR - shape ‘[2, 96, 8, 56, 56]’ is invalid for input of size 602112
mvit_v2_s seems to be a torchvision.models.video model described here which expects inputs in:
Accepts batched (B, T, C, H, W) and single (T, C, H, W) video frame torch.Tensor objects. The frames are resized to resize_size=[256] using interpolation=InterpolationMode.BILINEAR, followed by a central crop of crop_size=[224, 224]. Finally the values are first rescaled to [0.0, 1.0] and then normalized using mean=[0.45, 0.45, 0.45] and std=[0.225, 0.225, 0.225]. Finally the output dimensions are permuted to (..., C, T, H, W) tensors.
No you are right. I skimmed over that since the code is not properly formatted. @whovivkrajput could you please next time wrap your code in triple backticks?
I’m no expert here, but the clip_len=16 is set during training. So if you want to use our pretrained weights, you need to match that.
Thank you for your help… @pmeier@ptrblck …i do have few questions.
This model can be used for video recognition and looking at the input (2 , 3 , i , 224, 224) . i here is the frame length and according to your answer 16 is the clip_len…so can i use another value for clip_len other than 16???
so can i use another value for clip_len other than 16???
Nope. This is hardcoded here. You can open an issue if you need other clip lengths and thus being able to set this value in the builder of the model. Note however that this will not change the fact that our weights are trained for 16 frames and that won’t change. Meaning, you will have to train yourself.
For completion, here is full example on how to use the model:
from torchvision import models
import torch
name = "MViT_V2_S"
builder = models.get_model_builder(name)
weights = models.get_model_weights(name).DEFAULT
model = builder(weights=weights)
transform = weights.transforms()
input = torch.ones(2, 16, 3, 224, 224)
result = model(transform(input))