Distributed inference

Hello, I wanna implement distributed inference for large models across multiple GPUs in single machine. However , I found that torch.distributed APIs are mainly designed for training rather than inference. In other words, it is not convenient to use these APIs to deploy distributed inference due to some issues such as redundant gradients sync …
Could you give me some suggestions about how to implement distributed inference in Pytorch? Thanks so much