Hi,
I am trying to run an pytorch object detector using triton server. I used tracing for the model and scripting for the post-processing function. For a single GPU the torchscript runs smoothly on the server.
But, in the multi-GPU case, I am not able to run the compiled script. This is because the scripted post-process function memories the GPU id (cuda:0) which I used to run the torchscript and expects all the tensor operations to be performed using that id. This invariably fails when the triton server passes any other cuda device.