It should be possible to run different models on the same GPU, however I think you could lose a lot of performance, since the models would have to wait for each other to finish the processing.
Maybe multiprocessing might help, but I’m not really familiar with all the limitations.
What kind of deployment environment do you have?
You could easily setup a webserver using flask or any other framework and serve your models there.
If you need a lot of throughput on a local machine, I would go for ONNX and Caffe2.
PyTorch 1.0 will support easy deployment with Caffe2 as stated here. You would have to wait a few months though, because the version is scheduled to be released this summer/autumn as far as I know.