Equivalent to tf.data.Dataset.zip in pytorch?

Hi, i am moving from tensorflow to pytorch and i am looking for an equivalent to tf.data.Dataset.zip in pytorch.

Use case:

  • i have multiple quantities to precompute for a generated dataset dso (using sampling).

  • several datasets have to be created by mapping functions f1, f2, …fn: dsi = fi(dso) which can be scalars, vectors, matrices;

  • i want to be able to batch iterate all these datasets (including original one) simultaneously after an identical shuffle.

Is there an (elegant) to do this in pytorch ?

Thanks for any help.

If it is feasible for you to use IterDataPipe (it is a replacement for IterableDataset) from torchdata, you can use Zipper.

1 Like

Thank you for your good pointers.
I think Zipper is suitable for me.
I have never dive into this part of the documentation :wink:
Best,
Christophe.