DistributedDataParallel and SubsetRandomSampler

tchainzzz · March 31, 2020, 3:36am

I am currently using SubsetRandomSampler to enforce a train-val split on my custom dataset, which works well on my current single-GPU configuration. However, in anticipation of moving to training on multiple nodes and GPUs, I wanted to see if it’s possible to “wrap” the splits created by SubsetRandomSampler somehow such that within my train split, I can replicate the functionality of DistributedSampler.

If not – what alternatives do I have for creating a train-val split? Must I create separate Dataset objects for the train and the val set?

mrshenli · March 31, 2020, 2:21pm

cc @vincentqb for dataloader question.