DataPipeline and DDP training

I do not understand how to use datapipelines in multi-gpu training. Specifically, I have following questions:

  1. Where should shuffling happen? In datapipline or in dataloader?
  2. Is it correct to use DistributedSampler with dataloader + datapipline?
  3. Is there any examples of DDP + datapipline usage?