Loading Python model into nn::Module in libtorch

Is there a way to export a model to file in python and load it into a torch::nn::Module in libtorch?

I know it can be loaded into a jit::script::Module, but that does not solve my problem. The torch::jit API is completely orthogonal to torch::nn, and I do need to use torch::nn::Module only. Thanks!

1 Like

I don’t think there’s a way to do it.

You can recreate the architecture using torch::nn::Module API and then reload the state_dict() saved from Python though.

Valeu, @dfalbel!
Thanks for your response. And how do you suggest I extract the architecture from the model saved in python? Assuming all I get is the .pt file that someone saved in python, how can I recreate the architecture in libtorch with only that?

Boa @botelho !

I don’t think that’s possible. You will need to reimplement the architecture manually and then you can load the weights.
The pt file saved with torch.save contains pickled python objects and needs a Python runtime to be executed.

Obrigado de novo, @dfalbel!

This really blows my mind! I work on a libtorch-based application in which I can load HDF5 models from Keras. Since the HDF5 contains both the network graph (as a json string) and the weights, I can programmatically construct the nn::Module and populate its trainable parameters. All without going through any format conversion! And, ironically, I cannot do the same with a model saved from Pytorch itself in python!! I really wish some Pytorch developer explained the rationale behind making the .pt file so convoluted, and then developing a whole new API (jit::script) just to handle it in libtorch.

I can’t speak for PyTorch devs but I think the json string representation is convenient for this kind of application but also has its drawbacks, for example, if you have custom layers, and functions in your model, you need to recreate them (even in Python) before being able to reload the model. That doesn’t happen with torch.save() because it pickles everything that is needed to reload the model.

What exactly do you need from the torch::nn::Module interface that you don’t have from torch::jit::Modules?

Thanks again, @dfalbel. You make some good points about the limitations of the Keras approach, which might explain why Pytorch decided to go with the pickle-based solution. Although I suspect Keras has come up with ways to support loading custom layers (via lambdas). Besides, I would be more than happy if Pytorch allowed me to load at least the standard (non-custom) layers directly from python to nn::Module!

As per your question, the main reason I need nn::Module is because we already have an entire application developed around it, and switching to another API is not really feasible. And even if it were, there are some critical use cases that I believe would not be possible. For example, I might want to augment a model by adding more layers and only loading the weights for the old layers (e.g., for transfer-learning). I would not be able to do that with jit::script::Module, since the layers themselves are wrappers around nn::Module.

AFAICT the way it works now is by serializing the TensorFlow graph, which is similar to what TorchScript does.

Can you perhaps create a torch::nn::Module that has your loaded torch::jit::Module as a submodule? Then you can treat it as any other torch::nn::Module, append layers and etc?

I don’t think so, because only other nn::Modules or their derived types or can be registered as submodules.

FWIW this seems to work for me:

#include <torch/torch.h>
#include <torch/script.h>
#include <iostream>

int main() {

  // Define a new Module.
  struct Net : torch::nn::Module {
    torch::jit::Module module_;
    
    Net(std::string path) {
      module_ = torch::jit::load(path);
      for (const auto &par : module_.named_parameters()) {
        register_parameter(par.name, par.value);
      }
    }

    // Implement the Net's algorithm.
    torch::Tensor forward(torch::Tensor x) {
      std::vector<torch::jit::IValue> inputs;
      inputs.push_back(x);
      return module_.forward(inputs).toTensor();
    }

  };

  auto model = Net("./linear.pt");
  auto x = torch::ones({10, 10});
  auto y = (2*model.forward(x)).sum();
  y.backward();
  std::cout << model.parameters()[0].grad() << std::endl;
  
  return 0;
}

So in theory you can wrap the jit module into a nn::Module and then call forward, backprop into it, and etc.

1 Like

Thank you, that’s a promising idea! I had thought in terms of actual torch submodules, as in calling register_module on the nn::Module. But wrapping the jit::Module as member should work too, I’ll see where that takes me. There’s some immediate kinks to iron out, e.g.,

terminate called after throwing an instance of 'c10::Error'
  what():  Parameter name must not contain a dot (got 'conv1.weight')

It seems like registering the parameters will be a little more complicated…

Yeah, I guess you can recurse, eg with something like:

Net(torch::jit::Module module) {
      module_ = module;

      for (const auto &mod : module_.named_children()) {
        register_module(mod.name, std::make_shared<Net>(mod.value));
      }

      for (const auto &par : module_.named_parameters(false)) {
        register_parameter(par.name, par.value);
      }
    }

Yep, that should work. Thanks again!