Is `Dataset.__getitem__()` allowed to shuffle?

My custom Dataset.__getitem__() fetches samples (whole batches) from external storage. It suits me better to shuffle samples right there.

Question: does pytorch rely on subsequent (or even disjunct) calls of __getitem__() with the same index returning the same item ? This is not given in my case; I ignore the supplied index for training data and just fetch a random batch of samples.

I assumed so far that it is okay as long as I only fetch my batch (and store it in a local variable) only once per training loop iteration.


I don’t think it will break anything but that means you’re going to recreate the whole sampler logic that already exist :slight_smile:

Thanks for your reply. The sampler hasn’t much to do, I set pytorch batch size to 1 and return NCHW sized tensors from __getitem__().