How to generate random pairs at each epoch?

Hi, I am trying to generate a series of random pairs {x_1, x_2} from a specific dataset X, and I have done similar things before in TensorFlow by implementing a pair shuffling function in on_epoch_end() function of the TensorFlow generator, I am just wondering if there are similar functions for dataloader in PyTorch?

Note: setting shuffle=True in the dataloader would not work for me, since it only shuffle data in X, not generates random pairs.

Thank you!

How would these pairs be generated? Would samples be redrawn from the dataset or would all combinations be used?
I think the proper way would be to write a custom sampler and implement your sampling logic there.
The sampler passes its indices to Dataset.__getitem__ to load each sample and could pass indices for each pair to it.

Thanks for the reply! @ptrblck

Sample pairs are randomly drawn from the dataset. I am not sure if I understand it correctly, would Sampler only output list of indices, rather than index pairs?

Could you just double the batch size of the dataloader?

E.g.:

# assuming your data are images: batch of (128, 3, H, W)
batch_size = 128
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, drop_last=True)

for batch in dataloader:
    # pairs is a tuple of (64, 3, H, W) and (64, 3, H, W)
    pairs = torch.split(batch, batch_size//2, dim=0) 
1 Like

Yeah this should work, might also duplicate the dataset as well (so each point has chance to pair to itself), thank you so much!