Aim
I’m trying to gather some suggestions about how to implement a video loader implementing the class torch.utils.data.Dataset
, so that it can be fed to torch.utils.data.DataLoader
constructor.
Set up
Directory of multiple folders containing multiple MP4s with minimum size scaled to 256
.
Proposal
I could think of my multiple videos as a big long video of length tot_nb_frames
which will be reshaped into something like tot_nb_batches x batch_size x height x width x 3
, where tot_nb_batches = tot_nb_frames // batch_size
.
Now I have that
next(DataLoaderIter)
will call
DataLoaderIter._next_indeces()
which return
a list
(why not a tuple
???) of ordered numbers
list(range(t * batch_size, (t + 1) * batch_size))
called indeces
, with t
in the interval [0, tot_nb_batches]
.
Now my
dataset[i] for i in indeces
should return
the correct next frame for each row of the batch, so dataset
should have an appropriate internal mapping, which is based on batch_size
, attribute of DataLoader
and not of Dataset
.
Question / advices
Can anyone provide feedback on this strategy? Does it sound reasonable, or am I missing something?
Given that the mapping is based on batch_size
I am now thinking whether this should be performed by DataLoaderIter
instead.
Nevertheless, given a specific initial mapping, video readers should be initialised with different seeks. So, DataLoaderIter
should call an initialisation method of Dataset
, but I think that this is not currently supported.
Oh, well, I could have a lazy approach. Initialise the reader the first time a specific frame is requested. And, yeah, the indexing should be done from the DataLoaderIter
side, since the Dataset
should not care about batching at all.
More quircks
Say I perform the mapping with the Sampler
, I will have variable batch sizes, batch_size
and batch_size - 1
.
0 5 10 15 20 25
0 >xxxx|xxxxx|xxxxx|xxxxx|xxxxx|xxx
1 xxxxx|xxxxx|xxx>x|xxxxx|xxxxx|xxx
2 xxxxx|xx>xx|xxxxx|xxxxx|xxxxx|xxx
3 xxxxx|xxxxx|xxxxx|xxxxx|xxxx>|xxx
4 xxxxx|xxxxx|xxxxx|x>xxx|xxxx
After asking batch 23
the batch size should decrease by one. This is a mess, since _next_indices()
is still going to ask for batch_size
amount of data, screwing up everything.
Hacky solution 1
I could have the DataLoader
's batch_size = 1
, and have the Dataset
object returning batches (columns) itself, and sqeeze()
the singleton later on. But it looks nasty…
Hacky solution 2
Given that the missing data (bottom right corner) is < batch_size
, and this gets as big as 128
, usually, I could simply return a duplicate of the last frame, at worse, 127
time, which is roughly 2
seconds of video, compared to the hours of data. So… I think it’s just fine. Otherwise, I could use the beginning of video 0
. I think I’ll opt for this way.
The whole should look like this.
0 5 10 15 20 25
0 0oooo|ooooo|ooooo|ooooo|ooooo|ooo
1 ooooo|ooooo|ooo>x|xxxxx|xxxxx|xxx
2 xxxxx|xx>xx|xxxxx|xxxxx|xxxxx|xxx
3 xxxxx|xxxxx|xxxxx|xxxxx|xxxx>|xxx
4 xxxxx|xxxxx|xxxxx|x>xxx|xxxx0|ooo
where >
represents the head of a generic video and x
its subsequent frames, 0
represents the head of video zero, and o
its frames.