Multithreading on a single GPU using Single Model for Inferencing


I want to load the PyTorch model on a single GPU and perform multiple inferences through multi-threading on a single CUDA device. Is it possible, and if so, could you guide me on how to do so? I want to return inference results as well. The reason is I have noticed that not complete GPU power is utilized in single inference, so I thought through multithreading I would be able to 100% utilize my CUDA device