Several models loaded but just one at a time in GPU

Hey guys,

I am doing a website for comparing mask outputs of several models. However, this models are very big and just one model fits in my GPU at a time?

I am passing the image throught several models sequently.

Is there a way of moving to GPU a model, make inference and after that move model again to GPU?


What comes to me mind (and I have done sth similar) is to prepare a dict that would look like:

dict = {‘model1’: Model_1, ‘model2’: Model_2, … }

where Model_X is a PyTorch model with loaded weights and sitting on CPU.

During the run I would retrieve models sequentially by just getting one at a time from the dict. Move it to GPU, run inference, move back to CPU, clear GPU cache, load model2, …, etc.

The problem is definitely the time of loading/unloading models to/from GPUs. That would take a while definitely.

I have find a better approach @bonzogondo :

models = [...] # define a list of all models on the CPU

input = ... # get your input
for model in models:'cuda')
    pred = model(input)
    #make something with pred
    del pred'cpu')

They key fact is that you need to delete predictions variable if you don’t wank cuda memory issues

1 Like

This will be a good use case for TorchServe which supports multiple models. Please give that a try and provide your feedback.