When scaling training from a single worker to multiple workers (say, multiple GPUs on the same machine), DDP provides abstractions so that I do not have to think about how to best implement synchronization between the workers.
On evalutation, however, it seems currently no abstraction/best practice exists, and I have to resort to using lower-level distributed calls, to gather/reduce all my metrics into a single one. Is that correct, or does torch somehow provide a similar experience for evaluation?
Thanks for the helpful link. Defining an AverageMeter class and calling reduce seems to be very common (I do it in my code, based on other multiple repos that use this idea). Since this practice is common enough to be needed in many different places including what should be the most simple code (imagenet evaluation), any plans to be incorporated into PyTorch itself? Any reason to avoid it?