I need to extract some features from HMDB51/UCF101 dataset for a video classification task using a pretrained 3D CNN. From what I understand, the dataloaders available in Pytorch divide each video in a certain number of subclips (which I cannot set), separated by x frames (which I can set), and each subclip is made up of a set number of frames (which again I can set).
Let’s assume that I use the following code:
from torchvision.datasets import HMDB51 root = "<path_to_videos>" annotation_path = "<path_to_annotations>" frames_per_clip = 32 step_between_clips=50 fold=1 num_workers = 12 norm_value = 255 normalize = T.Normalize(mean=[114.7748 / norm_value, 107.7354 / norm_value, 99.4750 / norm_value], std=[0.22803, 0.22145, 0.216989]) height, width = 224, 224 transform_test = transforms.Compose([ T.ToFloatTensorInZeroOne(), T.Resize((height, width)), normalize ]) dataset_test = HMDB51(root, annotation_path, frames_per_clip, step_between_clips=step_between_clips, fold=fold, train=False, transform=transform_test, num_workers=num_workers)
The above code produces 2528 datapoints for the test split 1 of HMDB51. I would like to average the prediction for the clips belonging to the same video, so that I can measure the accuracy of my classifier at the video level and not only at the clip one.
To do so I thought about using
dataset_test.video_clips.get_clip_location() to get the video indeces, and then pick the labels in order according to the video index. The loader is not shuffled, so the video with index 0 gets the first label, the next one gets the second and so on.
In doing so I noticed that some videos are missing: Suppose that I try this:
for i in range(300,303): print(i, dataset_test.video_clips.get_clip_location(i))
This gives me
300 (137, 0) 301 (138, 0) 302 (141, 0)
Where are videos 139 and 140? Moreover, if I call
dataset_test.video_clips.cumulative_sizes[137:142], I get
[301, 302, 302, 302, 303]. Why do I get thos 3 numbers equal?
Also, what happens if a video is shorter than the
frames_per_clip I selected? Will the video be ignored?
Thanks in advance for the help!