Multiple models inference time on the same GPU

marksaroufim · February 22, 2022, 11:26pm

AFAIK GPUs are asynchronous machines so you shouldn’t need to manage multiple processes loading to the GPU, just load it as normal and let PyTorch and NVIDIA handle how to make it efficient