Deploying multiple PyTorch models on a single GPU

J_Johnson · October 11, 2021, 9:04am

I can tell that someone was watching Tesla AI day.

You can deploy as many models as your gpu has memory to hold. However, during training, a model can require 4-5 times more gpu ram and equivalent increase in calculation time as your model is doing optimization. So you might be better allocating GPUs accordingly via a distributed training workflow. But make sure the size of all your GPUs can fit on the deployment GPU. One way to do this would be to test the 10 models on your GPU untrained and in eval mode. Granted, the inference wouldn’t be any good as you haven’t trained them, yet, but it will show if you need to tune model sizes accordingly. Then you can tweak model sizes before training. Make sure you have a memory buffer of 10-15% from the max memory.