I’m trying to use couple of different models simultaneously in one big algorithm. Is it possible to run them on single GPU at the same time? And as I see, its kinda hard to deploy pytorch models in production pipeline. Should I turn to ONNX+Caffe2?
It should be possible to run different models on the same GPU, however I think you could lose a lot of performance, since the models would have to wait for each other to finish the processing.
Maybe multiprocessing might help, but I’m not really familiar with all the limitations.
What kind of deployment environment do you have?
You could easily setup a webserver using flask or any other framework and serve your models there.
If you need a lot of throughput on a local machine, I would go for ONNX and Caffe2.
PyTorch 1.0 will support easy deployment with Caffe2 as stated here. You would have to wait a few months though, because the version is scheduled to be released this summer/autumn as far as I know.
however I think you could lose a lot of performance, since the models would have to wait for each other to finish the processing.
What kind of deployment environment do you have?
I used a local machine and it was OK for other frameworks. E.g., in Tensorflow I created two different sessions and used them in the “production pipeline”. So each session was used individually and they were manually allocated in memory. Is it possible to make such a trick with pytorch or onnx + caffe2?
PyTorch 1.0 will support easy deployment with Caffe2 as stated here.
As I understood, release 1.0 will just simplify the onnx+caffe2 conversion. Am I wrong and it will be another solution? Anyway, I would like to find an alternative 1.0 solution now, if it possible
You could just create several different models, push them onto the GPU and feed your data.
I suppose Tensorflow is doing the same.
However, since the GPU has limited resources, your performance might be limited, since the models might have to wait for each other.
You could try to use the CPU instead, if you have single input images for example.
As a side note: maybe glow might be interesting for you.
# ... some code here
torch.save(model.state_dict(), "{}.pt".format(output_name))
# NOTE: output_name is modelX_name in next steps
Then I load model this way
model1 = ... # your model1. E.g, model1 = CatDogClassifier().to('cuda')
model2 = ... # your model2
model1.load_state_dict(torch.load(model1_name))
model2.load_state_dict(torch.load(model2_name))
And If I want to make a prediction I simply type
X = ... #some data.
# e.g, X = torch.tensor(X, requires_grad=False, dtype=torch.float).to('cuda')
out1 = torch.max(model1(X), 1)[1]
# and for numpy output if you use 'cuda':
# out1 = torch.max(model1(X), 1)[1].cpu().numpy()
Please, correct me if I’m wrong. Can you provide something like pseudocode if I’m missed something?
So, in my typo model1 = torch.Model() I loaded a model as the class. Is it possible to avoid this step?
Also, state_dict raises the error with unexpected and missing keys.
Note for hackers, who will look through this post later. I changed a little bit post with a solution, so don’t get confused about later my or @ptrblck comments