Sampler with multiple dataset simoultaneous iteration

I need to iterate simoultaneously on multiple dataset (let’s say 2 dataset) keeping the element of each of them isolated (each batch must contain only element of one dataset and for each step I want to work with one batch from each dataset). To do so I think that can be the right tool, for example:

dataset =, dataset2)
dataloader = DataLoader(dataset, batch_size=128, shuffle=True)
for index, (xb1, xb2) in enumerate(dataloader):

where xb1 refers to the input data and target associated to one of the 2 dataset.
My first question is: have I understood well the use of ? Does this approach solve my problem?

My second question is: how to put a sampler in the dataloader in this situation? can I, for example, define 2 indeces tensor Idx1 and Idx2 and put in DataLoader an option like sampler = (Idx1, Idx2)

An alternative approach could be to create a dataloader for each dataset, each one with his own sampler and use zip() to iterate simoultaneously on the 2 dataset. Is there a more clean solution for that (also beacosu I read that (source):

cycle() and zip() might create a memory leakage problem - especially when using image datasets!


using custom dataset might be best option here

Hi, thank you for your comment!
Then how can I prevent the mixing of element associated to different dataset inside a given batch?
And how to perform operation on a pair of batches (one for each dataset) at each step?

If you are gonna use the dataset class from Pytorch you will have whole control how your data is gonna load and you can ensure that no memory leak happens

for making pairs of batches you can use dataloader API and create batches of the dataset according to your need slr was quite preoccupied with work