- I am currently trying to infer 2 torch models on the same GPU, but my observation is that if 2 of them run at the same time in 2 different threads, the inference time is much larger than running them individually.
- Is there any way to make use of single GPU for running multiple models in parallel?
Reference: