Inference Multiple Models on a Single GPU

I am trying to build a system and i need to do inference on 60 segmentation models at same time ( Same models but different inputs). I wonder if this is possible to do on pytorch?
I am not sure about what kinda system i need to use for this, i am planning to use 4X RTX8000 and if it is not enough i can use two system with 4X RTX8000 each or with a better gpu.

Would i lose too much performance because of using multiple models? How many models i can put on a gpu, what the performance of it depends on? Is it just vRAM of gpu or processing speed? Sorry for asking such trivial questions but i really couldn’t come up with an answer by searching.

You can assume that i am going to use Yolact, https://github.com/dbolya/yolact
I would be very happy if you can help me, thank you so much and sorry for my English/grammar.

The simplest and probably the most efficient method whould be concatenate your samples in dimension 0 (i.e. the batch dimension).
If that is too much for one gpu, then wrap your model in DistributedDataParallel and let it handle the batched data.
Do not use multiple models unless they hold different parameters. It’s unecessary.

1 Like

It makes sense. I was planning to use 30 robotic arms, 2 model for each. By doing like that i guess every robotic arm will get info about what to do on same time. I think i can use that, i 've read about running multiple models on tensorflow is possible so that is why i wanted to know if it is possible on pytorch, thanks for answer again.