Why is there no distributed inference?

0xFFFFFFFF · January 28, 2019, 8:51am

Is there a way to enable distributed inference, instead of training? Also, is it possible to distribute the work across multiple servers each with multiple GPUs, or does it only work for a single server with multiple GPU? If any of these features are missing, will they be coming out soon?

Lastly, what would be the recommended environment / library to enable distributed inference on multiple servers each with multiple GPUs?

Thanks!

albanD · January 28, 2019, 10:27am

Hi,

For single server, you can use nn.DataParallel.
For multiple servers, the distributed package should have everything you need.

Valiox · November 26, 2020, 5:26pm

I don’t get it. Everything in the distributed docs relates to training.

And DataParallel can be used for inference, sure, but for production it has little use if requests come at random times.

seungjun · November 27, 2020, 1:26am

So far I have only used a singler-server multi-GPU environment but in principle, DDP can be used at inference time, too.

What hinders using DDP at inference are the

synchronization at backward
DistributedSampler that modifies the dataloader so that the number of samples are evenly divisible by the number of GPUs.

At inference, you don’t need backward computation and you don’t want to modify the evaluation data.
You can use a custom dataloader for evaluation similarly this example to avoid the problems.

A related thread is here.

osalpekar · November 29, 2020, 2:24am

For those looking for a production inference service that allows for serving requests on models in parallel, you can check out TorchServe.