How to use videos as image Dataset other than extract all the frames into image folders in advance?

The videos are labelled frame by frame. The frames are used to train a model.
I tried to write a subclass of Dataset, which loaded all the labelling information for all the videos at initialization and __len__() returns the total frame number of all videos, while the __getitem__(idx) method compute which video contains the idx-th frame and load this video and extract the corresponding frame.
The fact that for every __getitem__() query a video file is loaded and searched for seems a great waste of time and resource. Not surprisingly, even if the batchsize is set to 4(image num, not video num) and num_workers for dataloader is set to 1, this code just don’t work because of malloc failure.
I used to extract all the frames in advance and use ImageFolder dataset. But it takes a lot of extra disk space and hours of time to extract the frames.
So is there any other solutions? thanks~

1 Like

I am also working on a project involving video files and I have observed the same issue as you. The way I resolved it was to disregard the idx value passed into getitem and instead do my own random indexing within the function. I randomly reload a different video 10% of the time, otherwise just keep sampling cases from it. This will greatly reduce the number of videos that need to be loaded as new video loads only happen in 10% of the sampling cases. (or less, if you like)

Pseudo code:
def __getitem__(self, index):

    if random.random() > .9:
        # sometimes reload a new video

    # select a scene, and then select a frame
    sceneIdx = random.randrange(0, len(self.videoShots))
    scene = self.videoShots[sceneIdx]
    frameIndex = random.randrange(scene[0], scene[1] - 1) 
    frames, targets = self.buildTrainingCaseFromFrameIndex(frameIndex)
    return frames, targets

Yeah, this is a great method, thanks~