Fine-tune I3D model on a custom dataset

Hello! I want to fine-tune the I3D model for action recognition from torch hub, which is pre-trained on Kinetics 400 classes, on a custom dataset, where I have 4 possible output classes.

I’m loading the model and modifying the last layer by:

model = torch.hub.load("facebookresearch/pytorchvideo", "i3d_r50", pretrained=True)
num_classes = 4
model.blocks[6].proj = torch.nn.Linear(2048, num_classes)

I defined the getitem method of my Dataset to return:

def __getitem__(self, ind):
    [...]
    return processed_images, target

where processed_images and target are Tensors, with shapes:

>>processed_images.shape
torch.Size([5, 224, 224, 3])

>>target.shape
torch.Size([4])

Basically, processed_images is a sequence of 5 RGB images, each with shape (224, 224), while target is the one-hot encoding for the target classes.

In the training part, I have:

model.train()
model.to(device)
train_dataloader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=batch_size,
        shuffle=True,
        drop_last=False,
        persistent_workers=False,
        timeout=0,
    )

for epoch in range(number_of_epochs):
    for batch_ind, batch_data in enumerate(train_dataloader):
        # Extract data and label
        datas, labels = batch_data

        # move to device
        datas_ = datas.to(device)
        labels_ = labels.to(device)
        weights_ = weights.to(device)

        # permute axes (changing from [22, 5, 224, 224, 3] -> [22, 3, 5, 224, 224, 3]  
        datas_ = datas_.permute(0, 4, 1, 2, 3)
        preds_ = model(datas_)

But I’m getting an error in the forward method of ResNetBasicHead:

Exception has occurred: RuntimeError
input image (T: 2 H: 14 W: 14) smaller than kernel size (kT: 4 kH: 7 kW: 7)
  File "/home/c.demasi/.cache/torch/hub/facebookresearch_pytorchvideo_main/pytorchvideo/models/head.py", line 374, in forward
    x = self.pool(x)
  File "/home/c.demasi/.cache/torch/hub/facebookresearch_pytorchvideo_main/pytorchvideo/models/net.py", line 43, in forward
    x = block(x)
  File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/src/train_torch.py", line 271, in train
    preds_ = model(datas_)
  File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/src/train_torch.py", line 571, in train_roi
    train(training_parameters, train_from_existing_path=None, perform_tests=perform_tests, config=config)
  File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/train.py", line 13, in <module>
    train_roi(config=config, perform_tests=False)
RuntimeError: input image (T: 2 H: 14 W: 14) smaller than kernel size (kT: 4 kH: 7 kW: 7)

Any suggestion?

From the pooling + conv layers, CNNs typically have a minimum viable input size.
The error says that an intermediate input is too small. This is because the inputs don’t meet I3D’s minimal input size, and it seems to be that the third (T) dimension is the problem.

Best regards

Thomas

1 Like