A custom Dataset
should certainly work and depending on the create_noise
method you could directly add the noise to the data as seen in this post or sample it in each iteration.
Alternatively, you could also write a custom transformation as seen in this post, which might be a better approach.
However, based on your description I understand that create_noise
might be expensive and you want to avoid calling it for each sample and would thus prefer to call it for the entire batch.
In this case you could use the BatchSampler
and pass the indices for the entire batch to __getitem__
as seen in this post. This would also mean that the specified batch_size
in the DataLoader
is not representing the actual batch size any more and the number of samples in each batch would be defined by loader.batch_size * sampler.batch_size
. In my example I’ve defined it only in the sampler
and kept it default in the DataLoader
.