In this case, I would like to inject noise to the training dataset only, how can I do that? Because I can’t pass the type of data=“train” or “valid” like normally done…
You could add the noise in the training loop (outside of the Dataset).
Alternatively you could create a Dataset instance for each split (and add the noise to the training dataset), create the split indices, and wrap the datasets together with the corresponding split indices in Subsets.
Alternatively you could create a Dataset instance for each split (and add the noise to the training dataset), create the split indices, and wrap the datasets together with the corresponding split indices in Subset s.
This is what I am looking for but I don’t know how to do this…
No, you should create the “same” dataset three times (for the training dataset, you should add the noise argument, if available).
Each dataset is then passed to Subset with the corresponding indices.
If you are lazily loading the data, you won’t see any performance penalties using this approach.