Why is there no distributed inference?

Is there a way to enable distributed inference, instead of training? Also, is it possible to distribute the work across multiple servers each with multiple GPUs, or does it only work for a single server with multiple GPU? If any of these features are missing, will they be coming out soon?

Lastly, what would be the recommended environment / library to enable distributed inference on multiple servers each with multiple GPUs?



For single server, you can use nn.DataParallel.
For multiple servers, the distributed package should have everything you need.

1 Like

I don’t get it. Everything in the distributed docs relates to training.

And DataParallel can be used for inference, sure, but for production it has little use if requests come at random times.

So far I have only used a singler-server multi-GPU environment but in principle, DDP can be used at inference time, too.

What hinders using DDP at inference are the

  1. synchronization at backward
  2. DistributedSampler that modifies the dataloader so that the number of samples are evenly divisible by the number of GPUs.

At inference, you don’t need backward computation and you don’t want to modify the evaluation data.
You can use a custom dataloader for evaluation similarly this example to avoid the problems.

A related thread is here.

For those looking for a production inference service that allows for serving requests on models in parallel, you can check out TorchServe.

1 Like