Hi all!
I have two datasets, dataset 1 has ±400 samples and dataset 2 ±1000 samples. I want to train my network such that it sees one sample from dataset 1 for every 5 samples of dataset 2 (roughly).
With a batch size of 32, each batch containing 5 samples from dataset 1 and 27 from dataset 2 would work. How would I go about setting up a DataLoader to achieve this? I want to definitely train on all 1000 samples from dataset 2 every epoch, and just randomly sample however much I need from dataset 2 to achieve that distribution.
I looked at ConcatDataset but this just combines the two into one large dataset and samples from that. I saw DataLoader has a sampler
parameter; could I achieve what I want with a WeightedRandomSampler?
Any help is appreciated!