I have three datasets with 1600
, 400
and 200
images respectively. I use ConcatDataset
to merge the images in the three datasets. However, I see that the model learns to perform well on the first dataset (with 1600
images) and performs poorly on the other two. Is it possible to assign weights to the three datasets so that the network sees the images from second and third datasets more often that the images from first dataset. Specifically, I want to give weights to the individual datasets as 2200/1600
, 2200/400
and 2200/200
so that the images are sampled as I want. Also, is there any other way to achieve this? Thanks in advance for any help.
You could use a WeightedRandomSampler
as described in this post.
@ptrblck Thank you for your reply. I do not need the targets since I am training for object detection. So, I am using the following method:
dataset_train = torch.utils.data.ConcatDataset([dataset_train_1, dataset_train_2, dataset_train_3])
weights_train = [
[dataset_train.__len__() / dataset_train_1.__len__()] * dataset_train_1.__len__(),
[dataset_train.__len__() / dataset_train_2.__len__()] * dataset_train_2.__len__(),
[dataset_train.__len__() / dataset_train_3.__len__()] * dataset_train_3.__len__(),
]
weights_train = list(itertools.chain.from_iterable(weights_train))
sampler_train = torch.utils.data.WeightedRandomSampler(weights=weights_train, num_samples=len(weights_train))
This is similar to what is done in the post, except I am not weighing according to the targets, but rather the source dataset. Thank you again for your suggestion.