Why is there no distributed inference?

So far I have only used a singler-server multi-GPU environment but in principle, DDP can be used at inference time, too.

What hinders using DDP at inference are the

  1. synchronization at backward
  2. DistributedSampler that modifies the dataloader so that the number of samples are evenly divisible by the number of GPUs.

At inference, you don’t need backward computation and you don’t want to modify the evaluation data.
You can use a custom dataloader for evaluation similarly this example to avoid the problems.

A related thread is here.