Hi,
I’m currently working with PyTorch’s DataPipe module and I have a specific requirement for my data preprocessing pipeline. I’m using the following DataPipe: IterableWrapper(range(100 * 100)).shuffle().sharding_filter().map(lambda x: np.array([x, x + 1])).batch(20)
.
However, I need to make two modifications to this pipeline. Firstly, I would like to obtain the data before the shuffle()
operation is applied. Secondly, I want to insert a custom function, fn3
, between the sharding_filter()
and map(lambda x: np.array([x, x + 1]))
operations. The desired result would be:
IterableWrapper(range(100 * 100)).shuffle().sharding_filter().filter(fn3).map(lambda x: np.array([x, x + 1])).batch(20)
Could you please guide me on how to achieve these modifications? Any help would be greatly appreciated.