Assigning weights to individual datasets in ConcatDataset

Laya1 · December 1, 2022, 3:45pm

I have three datasets with 1600, 400 and 200 images respectively. I use ConcatDataset to merge the images in the three datasets. However, I see that the model learns to perform well on the first dataset (with 1600 images) and performs poorly on the other two. Is it possible to assign weights to the three datasets so that the network sees the images from second and third datasets more often that the images from first dataset. Specifically, I want to give weights to the individual datasets as 2200/1600, 2200/400 and 2200/200 so that the images are sampled as I want. Also, is there any other way to achieve this? Thanks in advance for any help.

ptrblck · December 1, 2022, 7:09pm

You could use a WeightedRandomSampler as described in this post.

Laya1 · December 2, 2022, 12:34pm

@ptrblck Thank you for your reply. I do not need the targets since I am training for object detection. So, I am using the following method:

dataset_train = torch.utils.data.ConcatDataset([dataset_train_1, dataset_train_2, dataset_train_3])

weights_train = [
    [dataset_train.__len__() / dataset_train_1.__len__()] * dataset_train_1.__len__(),
    [dataset_train.__len__() / dataset_train_2.__len__()] * dataset_train_2.__len__(),
    [dataset_train.__len__() / dataset_train_3.__len__()] * dataset_train_3.__len__(),
]
weights_train = list(itertools.chain.from_iterable(weights_train))
sampler_train = torch.utils.data.WeightedRandomSampler(weights=weights_train, num_samples=len(weights_train))

This is similar to what is done in the post, except I am not weighing according to the targets, but rather the source dataset. Thank you again for your suggestion.