Conditioned Dataloader

Hello All,

Let’s say I have a dataloader which returns video samples where each video sample is of the following shape : [number_of_frames X 2048] (number of frames can vary from the following set : [60 , 80 , 100 , 120].

I want in each batch returned from the dataloader to have samples with the same number of frames.
for example if the batchsize=16 . I want batches of the following shape : [16 X 80 X 2048] , [16 X 60 X 2048]…
Anyidea how to do it?

Should these batches be returned sequentially or are you expecting to get a batch for each sequence length in a single step?

In the latter case, you could wrap each Dataset for a specific sequence length in one MainDataset, which could return a frame for each sequence length. The DataLoader would then create the batches.

I am expecting to get a batch for each sequence length
for example one batch could be of shape : [16 , 80 , 2048] another batch would be of also [16 , 80 , 2048] or [16,120,2048] I thought about sorting the samples by lengths but I want to shuffle them!

You could probably write a custom sampler, which could yield the data indices based on the condition of the data length, but I’m not sure how complicated that would be.
Maybe the easier approach is to create separate DataLoaders, one for each length, use their iterators, and call randomly next on them?

Something like this might work:

dataset1 = TensorDataset(torch.randn(10, 1), torch.randn(10, 1))
dataset2 = TensorDataset(torch.randn(10, 2), torch.randn(10, 2))
dataset3 = TensorDataset(torch.randn(10, 3), torch.randn(10, 3))

loader1 = DataLoader(dataset1, batch_size=2)
loader2 = DataLoader(dataset2, batch_size=2)
loader3 = DataLoader(dataset3, batch_size=2)

loader_iter1 = iter(loader1)
loader_iter2 = iter(loader2)
loader_iter3 = iter(loader3)

iters = [loader_iter1, loader_iter2, loader_iter3]

np_epochs = 50
for epoch in range(nb_epochs):
    curr_iter = random.choice(iters)
        data, target = next(curr_iter)
    except StopIteration:
1 Like