The ImageNet example has a DistributedSampler for the training loader, but not the validation loader. This would appear to have every rank processing the entire data for the validation set. Is this necessary, or could a DistributedSampler be used for the validation loader also, to apply the multiple nodes to processing the validation set?
I have the same query. Did you able to find the answer for this?
I found a couple examples where a DistributedSampler was used for the validation or test set. I’m still not sure why the official Imagenet example doesn’t use it, it still seems wasteful to me. Here are a few of the examples:
It is not necessary to have every rank process the entire validation set. You can use a distributed sampler and average the errors afterwards to achieve the same result.
Actually, you cannot use ddp sampler to achieve validation. You can see DistributedSampler; note that the dataset has added extra samples to make it evenly divisible. Therefore, if your dataset is very small, the final result may be different. The official implementation is right.