Randomness during batching or bucketing in torchdata for Map-style pipes

After reading the doc and a few source code, it comes to me that, despite there are Batcher and Shuffler as the built-in pipes, should we really use them instead of relying on DataLoader and Sampler?
Since it seems that Shuffler is not dynamically shuffling across epochs, which produce deterministic batches, and there is no API to know the current epoch when calling getitem (compared to the set_epoch in traditional samplers).
The former is simpler to support, while the later seems to involve some API change.
I am wondering will there be solutions for “dynamically sort the items/batches with the epoch argument” across epochs in the future?

Sorry for bothering if there is already an existing way to do this.
Thanks very much!

Hi,

Thanks for your feedback. We are aware of the first issue that you mentioned, it is being tracked and worked on here.

As for your second point, I believe before torchdata, the solution is to create a new Sampler for each epoch. You can write a custom DataPipe that takes in an epoch argument and generate a new DataPipe for each epoch. Let me know if what your usage example looks like and how the current API is insufficient or inefficient. We are open to improving our API for additional common use cases.

Thanks!

I think the solution of new Sampler with epoch argument works for me. I did not realize I can create new data pipes inside the training loop (right before starting a new epoch). Thanks!