Is there a way to enable distributed inference, instead of training? Also, is it possible to distribute the work across multiple servers each with multiple GPUs, or does it only work for a single server with multiple GPU? If any of these features are missing, will they be coming out soon?
Lastly, what would be the recommended environment / library to enable distributed inference on multiple servers each with multiple GPUs?
So far I have only used a singler-server multi-GPU environment but in principle, DDP can be used at inference time, too.
What hinders using DDP at inference are the
synchronization at backward
DistributedSampler that modifies the dataloader so that the number of samples are evenly divisible by the number of GPUs.
At inference, you don’t need backward computation and you don’t want to modify the evaluation data.
You can use a custom dataloader for evaluation similarly this example to avoid the problems.