The videos are labelled frame by frame. The frames are used to train a model.
I tried to write a subclass of Dataset, which loaded all the labelling information for all the videos at initialization and __len__()
returns the total frame number of all videos, while the __getitem__(idx)
method compute which video contains the idx-th frame and load this video and extract the corresponding frame.
The fact that for every __getitem__()
query a video file is loaded and searched for seems a great waste of time and resource. Not surprisingly, even if the batchsize is set to 4(image num, not video num) and num_workers
for dataloader is set to 1, this code just don’t work because of malloc failure.
I used to extract all the frames in advance and use ImageFolder dataset. But it takes a lot of extra disk space and hours of time to extract the frames.
So is there any other solutions? thanks~
1 Like
I am also working on a project involving video files and I have observed the same issue as you. The way I resolved it was to disregard the idx value passed into getitem and instead do my own random indexing within the function. I randomly reload a different video 10% of the time, otherwise just keep sampling cases from it. This will greatly reduce the number of videos that need to be loaded as new video loads only happen in 10% of the sampling cases. (or less, if you like)
Pseudo code:
def __getitem__(self, index):
if random.random() > .9:
# sometimes reload a new video
self.loadNewVideo()
# select a scene, and then select a frame
sceneIdx = random.randrange(0, len(self.videoShots))
scene = self.videoShots[sceneIdx]
frameIndex = random.randrange(scene[0], scene[1] - 1)
frames, targets = self.buildTrainingCaseFromFrameIndex(frameIndex)
return frames, targets
Yeah, this is a great method, thanks~