I am trying to build a system and i need to do inference on 60 segmentation models at same time ( Same models but different inputs). I wonder if this is possible to do on pytorch?
I am not sure about what kinda system i need to use for this, i am planning to use 4X RTX8000 and if it is not enough i can use two system with 4X RTX8000 each or with a better gpu.
Would i lose too much performance because of using multiple models? How many models i can put on a gpu, what the performance of it depends on? Is it just vRAM of gpu or processing speed? Sorry for asking such trivial questions but i really couldn’t come up with an answer by searching.
You can assume that i am going to use Yolact, https://github.com/dbolya/yolact
I would be very happy if you can help me, thank you so much and sorry for my English/grammar.