Running torchscript on multiple GPUs

I am trying to run an pytorch object detector using triton server. I used tracing for the model and scripting for the post-processing function. For a single GPU the torchscript runs smoothly on the server.

But, in the multi-GPU case, I am not able to run the compiled script. This is because the scripted post-process function memories the GPU id (cuda:0) which I used to run the torchscript and expects all the tensor operations to be performed using that id. This invariably fails when the triton server passes any other cuda device.

Is there any workaround this?

How are you defining the multi-GPU use case? Could you explain your deployment and where it’s currently breaking?

In multi-GPU usecase I start one triton server on two GPUs and place one instance of the torchscript model on each GPU using triton config.

I am using triton server. What other information is required?

  • The triton server is running on cuda:0 and cuda:1
  • Torchscript was complied using cuda:0

The error is caused by model instance on cuda:1 of the server fails with the error message expected device cuda:1 but got device cuda:0

Would it work, if you write the PyTorch model device-agnostic, i.e. use cuda:0 as now and mask the other GPUs with CUDA_VISIBLE_DEVICES?