Loading videos from folders as a dataset object


I’m trying to gather some suggestions about how to implement a video loader implementing the class torch.utils.data.Dataset, so that it can be fed to torch.utils.data.DataLoader constructor.

Set up

Directory of multiple folders containing multiple MP4s with minimum size scaled to 256.


I could think of my multiple videos as a big long video of length tot_nb_frames which will be reshaped into something like tot_nb_batches x batch_size x height x width x 3, where tot_nb_batches = tot_nb_frames // batch_size.

Now I have that


will call


which return a list (why not a tuple???) of ordered numbers

list(range(t * batch_size, (t + 1) * batch_size))

called indeces, with t in the interval [0, tot_nb_batches].
Now my

dataset[i] for i in indeces

should return the correct next frame for each row of the batch, so dataset should have an appropriate internal mapping, which is based on batch_size, attribute of DataLoader and not of Dataset.

Question / advices

Can anyone provide feedback on this strategy? Does it sound reasonable, or am I missing something?

Given that the mapping is based on batch_size I am now thinking whether this should be performed by DataLoaderIter instead.
Nevertheless, given a specific initial mapping, video readers should be initialised with different seeks. So, DataLoaderIter should call an initialisation method of Dataset, but I think that this is not currently supported.
Oh, well, I could have a lazy approach. Initialise the reader the first time a specific frame is requested. And, yeah, the indexing should be done from the DataLoaderIter side, since the Dataset should not care about batching at all.

More quircks

Say I perform the mapping with the Sampler, I will have variable batch sizes, batch_size and batch_size - 1.

   0     5     10    15    20    25
0  >xxxx|xxxxx|xxxxx|xxxxx|xxxxx|xxx
1  xxxxx|xxxxx|xxx>x|xxxxx|xxxxx|xxx
2  xxxxx|xx>xx|xxxxx|xxxxx|xxxxx|xxx
3  xxxxx|xxxxx|xxxxx|xxxxx|xxxx>|xxx
4  xxxxx|xxxxx|xxxxx|x>xxx|xxxx

After asking batch 23 the batch size should decrease by one. This is a mess, since _next_indices() is still going to ask for batch_size amount of data, screwing up everything.

Hacky solution 1

I could have the DataLoader's batch_size = 1, and have the Dataset object returning batches (columns) itself, and sqeeze() the singleton later on. But it looks nasty…

Hacky solution 2

Given that the missing data (bottom right corner) is < batch_size, and this gets as big as 128, usually, I could simply return a duplicate of the last frame, at worse, 127 time, which is roughly 2 seconds of video, compared to the hours of data. So… I think it’s just fine. Otherwise, I could use the beginning of video 0. I think I’ll opt for this way.
The whole should look like this.

   0     5     10    15    20    25
0  0oooo|ooooo|ooooo|ooooo|ooooo|ooo
1  ooooo|ooooo|ooo>x|xxxxx|xxxxx|xxx
2  xxxxx|xx>xx|xxxxx|xxxxx|xxxxx|xxx
3  xxxxx|xxxxx|xxxxx|xxxxx|xxxx>|xxx
4  xxxxx|xxxxx|xxxxx|x>xxx|xxxx0|ooo

where > represents the head of a generic video and x its subsequent frames, 0 represents the head of video zero, and o its frames.


This here is actually an implementation of the dataset class, which is made for videos. The implemented getitem function loads several frames of a video and returns them.