I am a beginner in PyTorch. Now I want to divide a dataset into two parts: the train set and validation set when using
torch.distributed. I know that on a single GPU I can do this using a sampler:
indices = list(range(len(train_data))) train_loader = torch.utils.data.DataLoader( train_data, batch_size=args.batch_size, sampler=torch.utils.data.sampler.SubsetRandomSampler(indices[:split]), pin_memory=True, num_workers=2)
But when I want to train it in a parallel way using
torch.distributed, I have to use another sampler, namely,
sampler = torch.utils.data.distributed.DistributedSampler(train_data)
So how should I do to use the two samplers, so that I can divide the dataset and distribute it at the same time?
Thank you very much for any help!