I am using a ConcatDataset with a WeightedRandomSampler like this:
training_sets = data_augment(training_set)
self.train_dataset = ConcatDataset([MyDataset(
features=self.features,
transform=transforms.Compose([ToTensor()])
) for _set in training_sets])
num_samples = len(self.train_dataset)
weights = np.linspace(1, 3, num_samples)
sampler = WeightedRandomSampler(weights, num_samples)
self.train_loader = DataLoader(
self.train_dataset, sampler=sampler, num_workers=7, batch_size=self.batch_size
)
My idea for using the WeightedRandomSampler is to train more often on recent inputs (I want to see if that improves generalization performance).
However, because I am using a concat dataset, I will sample more often inputs in the last training sets
that are concatenated at the end, rather than inputs at the end of each sets. Is that right?
If that’s the case, then the fix would be to do multiple np.linspace()
, and then concatenate them the same way the ConcatDataset concats my training dataset.
Thoughts?