When scaling training from a single worker to multiple workers (say, multiple GPUs on the same machine), DDP provides abstractions so that I do not have to think about how to best implement synchronization between the workers.
On evalutation, however, it seems currently no abstraction/best practice exists, and I have to resort to using lower-level distributed calls, to gather/reduce all my metrics into a single one. Is that correct, or does torch somehow provide a similar experience for evaluation?
I’ve read Ddp: evaluation, gather output, loss, and stuff. how to? but wonder if things have changed ever since.