Couple of models in production

roaffix · May 22, 2018, 11:22am

Hi everyone,

I’m trying to use couple of different models simultaneously in one big algorithm. Is it possible to run them on single GPU at the same time? And as I see, its kinda hard to deploy pytorch models in production pipeline. Should I turn to ONNX+Caffe2?

Thanks,
Anton

ptrblck · May 22, 2018, 1:42pm

It should be possible to run different models on the same GPU, however I think you could lose a lot of performance, since the models would have to wait for each other to finish the processing.

Maybe multiprocessing might help, but I’m not really familiar with all the limitations.

What kind of deployment environment do you have?
You could easily setup a webserver using flask or any other framework and serve your models there.
If you need a lot of throughput on a local machine, I would go for ONNX and Caffe2.

PyTorch 1.0 will support easy deployment with Caffe2 as stated here. You would have to wait a few months though, because the version is scheduled to be released this summer/autumn as far as I know.

roaffix · May 22, 2018, 2:19pm

however I think you could lose a lot of performance, since the models would have to wait for each other to finish the processing.

What kind of deployment environment do you have?

I used a local machine and it was OK for other frameworks. E.g., in Tensorflow I created two different sessions and used them in the “production pipeline”. So each session was used individually and they were manually allocated in memory. Is it possible to make such a trick with pytorch or onnx + caffe2?

PyTorch 1.0 will support easy deployment with Caffe2 as stated here.

As I understood, release 1.0 will just simplify the onnx+caffe2 conversion. Am I wrong and it will be another solution? Anyway, I would like to find an alternative 1.0 solution now, if it possible

ptrblck · May 22, 2018, 2:25pm

You could just create several different models, push them onto the GPU and feed your data.
I suppose Tensorflow is doing the same.
However, since the GPU has limited resources, your performance might be limited, since the models might have to wait for each other.
You could try to use the CPU instead, if you have single input images for example.

As a side note: maybe glow might be interesting for you.

Yeah, that’s also how I understand it.

roaffix · May 22, 2018, 2:36pm

So, as I understood:

I save a model this way

# ... some code here
torch.save(model.state_dict(), "{}.pt".format(output_name))
# NOTE: output_name is modelX_name in next steps

Then I load model this way

model1 = ... # your model1. E.g, model1 = CatDogClassifier().to('cuda')
model2 = ... # your model2
model1.load_state_dict(torch.load(model1_name))
model2.load_state_dict(torch.load(model2_name))

And If I want to make a prediction I simply type

X = ... #some data. 
# e.g, X = torch.tensor(X, requires_grad=False, dtype=torch.float).to('cuda')
out1 = torch.max(model1(X), 1)[1]
# and for numpy output if you use 'cuda':
# out1 = torch.max(model1(X), 1)[1].cpu().numpy()

Please, correct me if I’m wrong. Can you provide something like pseudocode if I’m missed something?

ptrblck · May 22, 2018, 2:40pm

Looks perfectly fine!
Your model might not be located at torch.Model(), but I assume that’s just a typo.

You should definitely check, if CPU won’t be faster for single inputs.

roaffix · May 22, 2018, 2:52pm

Thanks. I will try ASAP

roaffix · May 25, 2018, 1:03pm

So, in my typo model1 = torch.Model() I loaded a model as the class. Is it possible to avoid this step?
Also, state_dict raises the error with unexpected and missing keys.

ptrblck · May 25, 2018, 1:54pm

I’m not sure, how else you would like to get your predictions.
A model might be the easiest solution.
Why do you want to avoid it?

The state_dict errors is thrown, if you save a model and change its architecture after.

roaffix · May 25, 2018, 2:14pm

Just tried to avoid long dependencies.

upd. Solved problem with state_dict. That was my mistake

roaffix · May 25, 2018, 3:21pm

Note for hackers, who will look through this post later. I changed a little bit post with a solution, so don’t get confused about later my or @ptrblck comments

SixerWang · November 30, 2020, 8:38am

I am sorry, I don’t understand how to run two models at the same time, your code could do this?