Instead of using random_split
you could create two CustomDataset
instances each one with the different transformation:
train_dataset = CustomDataset(filenames, train_transform)
val_dataset = CustomDataset(filenames, train_transform)
and then use Subset
on both with their corresponding indices:
train_dataset = Subset(train_dataset, train_indices)
val_dataset = Subset(val_dataset, val_indices)
which will make sure that only the *_indices
are used to draw samples from the internal dataset.
Also, since you are lazily loading the data (which is great!) the memory overhead should be small (only the file paths would be duplicated, but that shouldn’t matter).