RuntimeError: Expected 5-dimensional input for 5-dimensional weight 45 3 1 7 7, but got 4-dimensional input of size [117, 720, 1280, 3] instead

I am trying to work on video dataset , so when I read the video using the function torchvision.io.read_video
I am getting the ouput in form of a tuple. (I only want to work with video frames).
If i try to pass elements from tuple one by one to r2plus1d=models.video.r2plus1d_18(pretrained=False, progress=True)
I am getting this error.

Is this not a correct way to work with videos, if it is correct how I should modify this?

Based on the error message a 5D input tensors is expected. As described in the docs the tensors should have the shape [batch_size, 3, T, H, W], where T represents the number of frames while H and W were set to 112 in the pretrained models.

1 Like