Multiple models inference time on the same GPU

I used two processes to load two models on a single GPU. but I found the inference time for one process one model is almost similar with two processes two models. I just want to know how to run two models to make the inference in parallel on a single GPU.

AFAIK GPUs are asynchronous machines so you shouldn’t need to manage multiple processes loading to the GPU, just load it as normal and let PyTorch and NVIDIA handle how to make it efficient