Infer multiple torch models on a single GPU

  • I am currently trying to infer 2 torch models on the same GPU, but my observation is that if 2 of them run at the same time in 2 different threads, the inference time is much larger than running them individually.
  • Is there any way to make use of single GPU for running multiple models in parallel?

Reference:

The best approach here is to use an inferencing framework which handles this problem for you pytorch/serve is an example I work and can help answer any questions