Hello,

I am trying to implement a dataloader with the following sampling characteristics:

Essentially, given that there are N (e.g. N=10) data points in a dataset, I am trying to build a dataloader where the data points sampled in the second half of a batch will depend on the data points sampled in the first half of a batch. Let’s say that in the first half of the batch (batch size of 4, so half the batch size is 2), we sample datapoint 1 and datapoint 5. Let’s also assume that if we sample datapoint 1, we want to prioritize sampling datapoint 9 within that same batch. Since we have sampled datapoint 1 in the first half of the batch, we want to sample datapoint 9 and some other datapoint to complete the 4 datapoint sample, i.e.,

N: {1,2,3,4,5,6,7,8,9,10}

Batch size=4

Fist 2 elements of a batch: {1, 5}

Desired 4 elements of this batch: {1,5,9,n}, where datapoint 9 is desired based on the presence of datapoint 1 in this batch, and n is some other random datapoint.

Can someone please advise on whether such custom dataloader can be implemented with PyTorch, and if yes, how I can implement that? Thanks a lot in advance!

Best,

Niam