Custom Dataset that splits the data and applies Transforms accordingly

ptrblck · October 20, 2021, 5:40am

Instead of using random_split you could create two CustomDataset instances each one with the different transformation:

train_dataset = CustomDataset(filenames, train_transform)
val_dataset = CustomDataset(filenames, train_transform)

and then use Subset on both with their corresponding indices:

train_dataset = Subset(train_dataset, train_indices)
val_dataset = Subset(val_dataset, val_indices)

which will make sure that only the *_indices are used to draw samples from the internal dataset.
Also, since you are lazily loading the data (which is great!) the memory overhead should be small (only the file paths would be duplicated, but that shouldn’t matter).