I3D resnet action recognition model - expected input shape

Hello! I want to fine-tune the I3D model for action recognition from torch hub, which is pre-trained on Kinetics 400 classes, on a custom dataset, where I have 4 possible output classes.

I’m loading the model and modifying the last layer by:

model = torch.hub.load("facebookresearch/pytorchvideo", "i3d_r50", pretrained=True)
num_classes = 4
model.blocks[6].proj = torch.nn.Linear(2048, num_classes)

I couldn’t find any indication as to which is the expected output for the model.
Currently, I’m passing my data in the shape:

[batch_size, num_channels, num_images, width, height]

Is that correct?