Image in same set after random_split() in multiple datapipes

With reference to the above diagram, we have a use-case where we are creating separate datapipes for each label_classes.

Let’s say we have an Image101.jpg

Now this Image101.jpg has been annotated with label_cls_1 and label_cls_2
Image101.jpg will be in result-set of DB query for both datapipe_1 and datapipe_3 as well.

Now, How can we make sure that after random_split() Image101.jpg would come either under TRAIN set or TEST set for both datapipe_1 and datapipe_3
If Image101.jpg comes in test_datapipe_1 after random_shuffle() then, it should also come under test_datapipe_3 and not train_datapipe_3

What would be the recommended way for this?

Depending on the randomness of the splitting approach e.g. by seeding the code could be quite tricky and could easily cause issues. I would recommend to split the dataset once and then to use the split indices to create Subsets for the other datasets, which would make sure no data is leaked from the training into the test set in any pipeline.