I’m trying to prepare a HMDB51 dataset for some image classification tasks. Have done this many times before in TF but this time I am working in PyTorch. Running into this strange problem when creating my dataloaders:
import torchvision.transforms as T
import torchvision.datasets as datasets
val_split = 0.05
num_frames = 16 # 16
clip_steps = 50
num_workers = 8
pin_memory = True
train_transforms = T.Compose([
T.ToTensor(),
T.Resize((128, 171)),
T.RandomHorizontalFlip(),
T.Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
T.CenterCrop((112, 112))
])
test_transforms = T.Compose([
T.ToTensor(),
T.Resize((128, 171)),
T.Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
T.CenterCrop((112, 112))
])
hmdb51_train = datasets.HMDB51(
'video_data/',
'test_train_splits/',
num_frames,
step_between_clips=clip_steps,
fold=1,
train=True,
transform=train_transforms,
num_workers=num_workers
)
hmdb51_test = datasets.HMDB51(
'video_data/',
'test_train_splits/',
num_frames,
step_between_clips=clip_steps,
fold=1,
train=False,
transform=test_transforms,
num_workers=num_workers
)
total_train_samples = len(hmdb51_train)
total_val_samples = round(val_split * total_train_samples)
print(f"number of train samples {total_train_samples}")
print(f"number of validation samples {total_val_samples}")
print(f"number of test samples {len(hmdb51_test)}")
I have made sure that the HMDB51 dataset is indeed in ‘video_data’ and the splits are also where they should be. Yet when I try to access the train dataloader in any way:
tensor = hmdb51_test.__getitem__(0)
#tensor_alternative = hmdb51_test[0][0]
#tensor_alternative_2 = hmdb51_test[0]
I simply get
"name": "TypeError",
"message": "pic should be PIL Image or ndarray. Got <class 'torch.Tensor'>"
Which is fair enough because ToTensor() only accepts PIL or an ndarray. However, this makes me wonder what the point of the default Dataset class of HMDB51 is? Why do I get to specify the number of frames and the clip steps if I can’t even load them into my dataset class? What is the correct way to load e.g. 16 frames of each .AVI clip into a dataloader?
Edit: also tried removing ‘ToTensor()’ which returns the following error:
Input tensor should be a float tensor. Got torch.uint8.
Thanks!