Infer multiple torch models on a single GPU

Tuyen_Vo_Quang · October 14, 2020, 10:06am

I am currently trying to infer 2 torch models on the same GPU, but my observation is that if 2 of them run at the same time in 2 different threads, the inference time is much larger than running them individually.
Is there any way to make use of single GPU for running multiple models in parallel?

Reference:

marksaroufim · February 11, 2022, 9:32pm

The best approach here is to use an inferencing framework which handles this problem for you pytorch/serve is an example I work and can help answer any questions