I am trying to run an pytorch object detector using triton server. I used tracing for the model and scripting for the post-processing function. For a single GPU the torchscript runs smoothly on the server.
But, in the multi-GPU case, I am not able to run the compiled script. This is because the scripted post-process function memories the GPU id (
cuda:0) which I used to run the torchscript and expects all the tensor operations to be performed using that id. This invariably fails when the triton server passes any other cuda device.
Is there any workaround this?
How are you defining the multi-GPU use case? Could you explain your deployment and where it’s currently breaking?
In multi-GPU usecase I start one triton server on two GPUs and place one instance of the torchscript model on each GPU using triton config.
I am using triton server. What other information is required?
- The triton server is running on
- Torchscript was complied using
The error is caused by model instance on
cuda:1 of the server fails with the error message
expected device cuda:1 but got device cuda:0
Would it work, if you write the PyTorch model device-agnostic, i.e. use
cuda:0 as now and mask the other GPUs with