Infer multiple torch models on a single GPU

  • I am currently trying to infer 2 torch models on the same GPU, but my observation is that if 2 of them run at the same time in 2 different threads, the inference time is much larger than running them individually.
  • Is there any way to make use of single GPU for running multiple models in parallel?


The best approach here is to use an inferencing framework which handles this problem for you pytorch/serve is an example I work and can help answer any questions