What is wrong with my HMDB51 dataloader?

I’m trying to prepare a HMDB51 dataset for some image classification tasks. Have done this many times before in TF but this time I am working in PyTorch. Running into this strange problem when creating my dataloaders:

import torchvision.transforms as T
import torchvision.datasets as datasets

val_split = 0.05
num_frames = 16  # 16
clip_steps = 50
num_workers = 8
pin_memory = True

train_transforms = T.Compose([
    T.ToTensor(),
    T.Resize((128, 171)),
    T.RandomHorizontalFlip(),
    T.Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
    T.CenterCrop((112, 112))
])

test_transforms = T.Compose([
    T.ToTensor(),
    T.Resize((128, 171)),
    T.Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
    T.CenterCrop((112, 112))
])

hmdb51_train = datasets.HMDB51(
    'video_data/',
    'test_train_splits/',
    num_frames,
    step_between_clips=clip_steps,
    fold=1,
    train=True,
    transform=train_transforms,
    num_workers=num_workers
)

hmdb51_test = datasets.HMDB51(
    'video_data/',
    'test_train_splits/',
    num_frames,
    step_between_clips=clip_steps,
    fold=1,
    train=False,
    transform=test_transforms,
    num_workers=num_workers
)

total_train_samples = len(hmdb51_train)
total_val_samples = round(val_split * total_train_samples)

print(f"number of train samples {total_train_samples}")
print(f"number of validation samples {total_val_samples}")
print(f"number of test samples {len(hmdb51_test)}")

I have made sure that the HMDB51 dataset is indeed in ‘video_data’ and the splits are also where they should be. Yet when I try to access the train dataloader in any way:

tensor = hmdb51_test.__getitem__(0)
#tensor_alternative = hmdb51_test[0][0]
#tensor_alternative_2 = hmdb51_test[0]

I simply get

"name": "TypeError",
"message": "pic should be PIL Image or ndarray. Got <class 'torch.Tensor'>"

Which is fair enough because ToTensor() only accepts PIL or an ndarray. However, this makes me wonder what the point of the default Dataset class of HMDB51 is? Why do I get to specify the number of frames and the clip steps if I can’t even load them into my dataset class? What is the correct way to load e.g. 16 frames of each .AVI clip into a dataloader?

Edit: also tried removing ‘ToTensor()’ which returns the following error:

Input tensor should be a float tensor. Got torch.uint8.

Thanks!