Loading videos from folders as a dataset object


(Alfredo Canziani) #1

Aim

I’m trying to gather some suggestions about how to implement a video loader implementing the class torch.utils.data.Dataset, so that it can be fed to torch.utils.data.DataLoader constructor.

Set up

Directory of multiple folders containing multiple MP4s with minimum size scaled to 256.

Proposal

I could think of my multiple videos as a big long video of length tot_nb_frames which will be reshaped into something like tot_nb_batches x batch_size x height x width x 3, where tot_nb_batches = tot_nb_frames // batch_size.

Now I have that

next(DataLoaderIter)

will call

DataLoaderIter._next_indeces()

which return a list (why not a tuple???) of ordered numbers

list(range(t * batch_size, (t + 1) * batch_size))

called indeces, with t in the interval [0, tot_nb_batches].
Now my

dataset[i] for i in indeces

should return the correct next frame for each row of the batch, so dataset should have an appropriate internal mapping, which is based on batch_size, attribute of DataLoader and not of Dataset.

Question / advices

Can anyone provide feedback on this strategy? Does it sound reasonable, or am I missing something?

Given that the mapping is based on batch_size I am now thinking whether this should be performed by DataLoaderIter instead.
Nevertheless, given a specific initial mapping, video readers should be initialised with different seeks. So, DataLoaderIter should call an initialisation method of Dataset, but I think that this is not currently supported.
Oh, well, I could have a lazy approach. Initialise the reader the first time a specific frame is requested. And, yeah, the indexing should be done from the DataLoaderIter side, since the Dataset should not care about batching at all.

More quircks

Say I perform the mapping with the Sampler, I will have variable batch sizes, batch_size and batch_size - 1.

   0     5     10    15    20    25
0  >xxxx|xxxxx|xxxxx|xxxxx|xxxxx|xxx
1  xxxxx|xxxxx|xxx>x|xxxxx|xxxxx|xxx
2  xxxxx|xx>xx|xxxxx|xxxxx|xxxxx|xxx
3  xxxxx|xxxxx|xxxxx|xxxxx|xxxx>|xxx
4  xxxxx|xxxxx|xxxxx|x>xxx|xxxx

After asking batch 23 the batch size should decrease by one. This is a mess, since _next_indices() is still going to ask for batch_size amount of data, screwing up everything.

Hacky solution 1

I could have the DataLoader's batch_size = 1, and have the Dataset object returning batches (columns) itself, and sqeeze() the singleton later on. But it looks nasty…

Hacky solution 2

Given that the missing data (bottom right corner) is < batch_size, and this gets as big as 128, usually, I could simply return a duplicate of the last frame, at worse, 127 time, which is roughly 2 seconds of video, compared to the hours of data. So… I think it’s just fine. Otherwise, I could use the beginning of video 0. I think I’ll opt for this way.
The whole should look like this.

   0     5     10    15    20    25
0  0oooo|ooooo|ooooo|ooooo|ooooo|ooo
1  ooooo|ooooo|ooo>x|xxxxx|xxxxx|xxx
2  xxxxx|xx>xx|xxxxx|xxxxx|xxxxx|xxx
3  xxxxx|xxxxx|xxxxx|xxxxx|xxxx>|xxx
4  xxxxx|xxxxx|xxxxx|x>xxx|xxxx0|ooo

where > represents the head of a generic video and x its subsequent frames, 0 represents the head of video zero, and o its frames.