Couple of models in production

Hi everyone,

I’m trying to use couple of different models simultaneously in one big algorithm. Is it possible to run them on single GPU at the same time? And as I see, its kinda hard to deploy pytorch models in production pipeline. Should I turn to ONNX+Caffe2?

Thanks,
Anton

1 Like

It should be possible to run different models on the same GPU, however I think you could lose a lot of performance, since the models would have to wait for each other to finish the processing.

Maybe multiprocessing might help, but I’m not really familiar with all the limitations.

What kind of deployment environment do you have?
You could easily setup a webserver using flask or any other framework and serve your models there.
If you need a lot of throughput on a local machine, I would go for ONNX and Caffe2.

PyTorch 1.0 will support easy deployment with Caffe2 as stated here. You would have to wait a few months though, because the version is scheduled to be released this summer/autumn as far as I know.

1 Like

however I think you could lose a lot of performance, since the models would have to wait for each other to finish the processing.

What kind of deployment environment do you have?

I used a local machine and it was OK for other frameworks. E.g., in Tensorflow I created two different sessions and used them in the “production pipeline”. So each session was used individually and they were manually allocated in memory. Is it possible to make such a trick with pytorch or onnx + caffe2?

PyTorch 1.0 will support easy deployment with Caffe2 as stated here.

As I understood, release 1.0 will just simplify the onnx+caffe2 conversion. Am I wrong and it will be another solution? Anyway, I would like to find an alternative 1.0 solution now, if it possible :slight_smile:

You could just create several different models, push them onto the GPU and feed your data.
I suppose Tensorflow is doing the same.
However, since the GPU has limited resources, your performance might be limited, since the models might have to wait for each other.
You could try to use the CPU instead, if you have single input images for example.

As a side note: maybe glow might be interesting for you.

Yeah, that’s also how I understand it.

So, as I understood:

  1. I save a model this way
# ... some code here
torch.save(model.state_dict(), "{}.pt".format(output_name))
# NOTE: output_name is modelX_name in next steps
  1. Then I load model this way
model1 = ... # your model1. E.g, model1 = CatDogClassifier().to('cuda')
model2 = ... # your model2
model1.load_state_dict(torch.load(model1_name))
model2.load_state_dict(torch.load(model2_name))
  1. And If I want to make a prediction I simply type
X = ... #some data. 
# e.g, X = torch.tensor(X, requires_grad=False, dtype=torch.float).to('cuda')
out1 = torch.max(model1(X), 1)[1]
# and for numpy output if you use 'cuda':
# out1 = torch.max(model1(X), 1)[1].cpu().numpy()

Please, correct me if I’m wrong. Can you provide something like pseudocode if I’m missed something?

Looks perfectly fine!
Your model might not be located at torch.Model(), but I assume that’s just a typo.

You should definitely check, if CPU won’t be faster for single inputs.

Thanks. I will try ASAP

So, in my typo model1 = torch.Model() I loaded a model as the class. Is it possible to avoid this step?
Also, state_dict raises the error with unexpected and missing keys.

I’m not sure, how else you would like to get your predictions.
A model might be the easiest solution.
Why do you want to avoid it?

The state_dict errors is thrown, if you save a model and change its architecture after.

Just tried to avoid long dependencies.

upd. Solved problem with state_dict. That was my mistake :slight_smile:

Note for hackers, who will look through this post later. I changed a little bit post with a solution, so don’t get confused about later my or @ptrblck comments :slight_smile:

I am sorry, I don’t understand how to run two models at the same time, your code could do this?