Serving Model Trained with PyTorch

Tensorflow has Tensorflow Serving. I know pytorch is a framework in its early stages, but how do people serve models trained with pytorch. Must it be from Python? I’m specifically looking to serve from C++.


We don’t have a way to serve models from C++ right now, and it’s not a priority for us at this stage. There are many things like distributed training and double backward that we’ll be implementing first. Sorry!


Would you say that pytorch was built with serving in mind, e.g. for an API, or more for research purposes?

We’re more research oriented. We’re rather thinking of creating tools to export models to frameworks that are more focused on production usage like Caffe2 and TensorFlow.


Also you mentioned double backward. This is the first I’ve heard of it. I found a paper by Yann LeCun on double backpropagation, but was wondering whether it’s common to use such a method.

Hi, I’m playing with a possible solution for serving from C based on TH and THNN. It’ll be limited to statically compilable graphs of course. I should have something to share in the not so distant future.


@lantiga Awesome! Let us know if you need any help! I can answer any questions about the structure of our graphs and how can you export them. We still consider these things internal and they will have to change in the near future to support multiple backward and lazy execution.


Thank you @apaszke! I’m aware of the fact that the graph structure is going to change considerably in the future, but delving into it now while things are simpler sounds like a good idea to me.

My plan is to focus solely on inference and implement a first graph2c “transpiler”, which will generate C code directly, without exporting to an intermediate format. It may sound hacky but it could actually be enough for us for the moment and it would avoid having to struggle with polymorphic C.
Eventually, this could become a basis for a more refined solution in which we export the graph and have a C runtime execute it.

This is driven by our need of slim deploys and our determination to use pytorch in production :slight_smile:


Sure that sounds cool. It doesn’t seem hacky, it’s just a graph compiler. It’s a very good start, and will likely be capable of producing small binaries. Let us know when there’s going to be any progress or in case you have any trouble. We’ll definitely showcase your solution somewhere.

let us know, also interested.
For now we will create a python script to export to a Torch7 model, and then use: in production code

1 Like

Making progress. As soon as I get the first MNIST example to compile I’ll share what I have.

1 Like

We need to deploy pytorch models to e.g. Android, so we need a method to export a model. This is my starting point. Can you please tell me if I am on the right way or if I am doing something totally stupid?

import sys
import torch
from torch import nn
from torchvision import models
from torch.utils.serialization import load_lua

def dump(f):
	s = str(f.__class__)
	for fa in f.previous_functions:
		if isinstance(fa[0], torch.autograd.Function):
		if isinstance(fa[0], torch.nn.parameter.Parameter):
		elif isinstance(fa[0], torch.autograd.Variable):

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(16)
        self.conv2 = nn.Conv2d(3, 16, kernel_size=1, bias=True)

    def forward(self, x):
        return self.bn1(self.conv1(x))+self.conv2(x)

#net = models.alexnet()
#net=load_lua('') #Legacy networks won't work (no support for Variables)
net = MyNet()

The output for the simple MyNet will be



This will work for now, but may break in the future. We’re still actively working on autograd internals, and there are two possible ways we can take now, but we’re still thinking which one is the best. The only caveat right now is that instead of BatchNorm you may find BatchNormBackward in the graph. Are you on slack? I can keep you posted about the currently used data structures if you want.

1 Like

Yes, please. I have just sent an invitation request for slack to soumith.

So, if you’re interested this is what I have so far: (I think) I’m close, I’m working at serializing THStorage right now and probably there’s a number of other issues, but you can start to take a peek.

I’m not sure how profoundly things will have to be reworked with the upcoming changes in autograd, but it’s fun anyway.


Quick update: as of commit 9d0fd21, both the feedforward and MNIST tests pass (they verify that the output of the compiled code matches the output from PyTorch for the same input). I also added a few scripts to get up and running quickly, so things are kind of starting to shape up. /cc @apaszke @Eugenio_Culurciello


this looks great! the OpenNMT guys might be interested too: @jeansenellart

1 Like

Great, very nice work, thank you.

1 Like

Since there are some people hacking with autograd internals, I’ve created a slack channel #autograd-internals. I’ll be sending @channel messages every time we make a breaking change to our representation so you can be up to date.

@lantiga Awesome!

1 Like

Via @mvitez:
For your information, I have created a PyTorch exporter that dumps the execution graph to a file that thnets will be able to read. All the models in torchvision work.