How to train two datasets with one dataloader?

Hello all, I have a problem as follows:

  • Dataset 1 has 1000 images
  • Dataset 2 has 10 images.

I want to train a model , in which the images in the dataset 2 must be always use in the training. For example, I want to train with batch size of 64, then for each mini-batch, we should load all 10 images from dataset 2 and remaining from dataset 1. I am using customer dataset with distributed sampler. Do we have any solution for that?

Have you tried using a separate dataloader with batch size 10 for Dataset 2?

Good idea but how to get them work together

Create dataloader1 for Dataset 1 with batch size 54.
Create dataloader2 for Dataset 2 with batch size 10
Then do something like:

for x1,y1 in dataloader1:
   x2, y2 = next(iter(dataloader2))
   # then concatenate x = [x1,x2] and y = [y1,y2]
   # use x,y for fwd pass and backwd pass
   

Relevant discussion

1 Like