I am trying to compare torch serve and ray serve performances. I started with ray serve, and with ray serve I can decide how many replicas of a given model to launch at the same time. Now for torch serve I successfully served a model but I cannot find how to deploy multiple replicas of the same model on a single machine (say I have 2 gpus and my model uses 1 gpu, why not deploying 2 times the model so that it can handle more requests ?).
Edit: using it, it seems that is uses replicas, but I just don’t control it.
Thank you in advance for your help.