PyTorch in production

I’m in need of productionizing some models (written in PyTorch) through a rest api.
Torchscript is a good candidate for this, however for many production settings serving a model involves many moving parts, i.e. batching, logging, monitoring, rolling out new models etc.

Tensorflow has tensorflow serving which solves many of these issues.

Is there anything on the roadmap for this kind of functionality anytime soon (e.g. Pytorch serving)?
I have seen the PR:, and states that the target is Q4 for experimental release; will this be released with 1.4 / 1.5 or is it delay / cancelled / don’t know?

I have seen multiple blogs that talks about this topic, however most of them are simple example of flask application.


It sounds like you may have already seen this, but we often refer questions on PyTorch model serving/deployment to this post:

A few open source model servers support PyTorch directly. Some users roll their own, as in the flask example you referenced. Some take an ONNX export of a trained model and load that into another model server.

Note that for the ongoing work you referenced:

This would be a separate application, not part of the PyTorch code itself, so wouldn’t necessarily be tied to PyTorch version numbers. Right now we’re ironing out requirements and basic design, but we’d be interested in any feature requests or feedback.

Hi Brian,

Thanks for the replay!

I have heard in many talks that “Facebook uses PyTorch for large-scale production and is exported (through TorchScript) to their optimized C++ backend for inference”, is there any information on how this backend works and will this ever be open-sources or is it to tightly coupled with FB’s products?

I haven’t worked on the internal inference platform, but my understanding is that what we do internally is tightly coupled with internal systems and unlikely to be open sourced.