How to validate in DistributedDataParallel correctly?

seungjun · October 16, 2020, 4:50am

Evaluating with DistributedDataParallel should be done with care otherwise the values could be inaccurate. DistributedSampler can pad some replicated data when the number of samples per process is not even. How DistributedSampler works is explained here.

This is because DDP checks synchronization at backprops and the number of minibatch should be the same for all the processes. However, at evaluation time it is not necessary.

You can use a custom sampler like DistributedEvalSampler to avoid data padding. Regarding the communication between the DDP processes, you can refer to this example.