Evaluating with DistributedDataParallel
should be done with care otherwise the values could be inaccurate. DistributedSampler
can pad some replicated data when the number of samples per process is not even. How DistributedSampler
works is explained here.
This is because DDP checks synchronization at backprops and the number of minibatch should be the same for all the processes. However, at evaluation time it is not necessary.
You can use a custom sampler like DistributedEvalSampler to avoid data padding. Regarding the communication between the DDP processes, you can refer to this example.