(libtorch) Save MNIST c++ example's trained model into a file, and load in from another c++ file to use for prediction?

Hi, I was trying to explore how to train the mnist model in C++, save the model, and having another C++ to load the file and use it as inference system.

I tried the methods in (libtorch) How to save model in MNIST cpp example?,

  1. Using original mnist.cpp, add 3 lines of codes to save the model:
     torch::serialize::OutputArchive output_archive;
     model.save(output_archive);
     output_archive.save_to("model.pt");
  1. Change the Net by NetImpl as suggested, and save the mode with:
     torch::save(model, "model.pt");

with both, I am able to compile and run the code and save the model, however, when I am trying to load the model from another C++ file, I get the error of “‘ScriptModule’ object has no attribute ‘forward’”

To simplified the testing, I tried to load the model in Python:

  1. Load with torch.load
import torch
# load by torch.load
model = torch.load('model.pt')
## Error loading 
## RuntimeError: model_impl.pt is a zip archive (did you mean to use torch.jit.load()?)
  1. Load by torch.jit.load
# Loaded successfully, but...
model = torch.jit.load('model.pt')
model.eval()

# Python output
ScriptModule(
  (conv1): ScriptModule()
  (conv2): ScriptModule()
  (conv2_drop): ScriptModule()
  (fc1): ScriptModule()
  (fc2): ScriptModule()
)

# When try to have a forward pass, I get the following error.
output = model(torch.ones(1, 1, 28, 28))

## Error : AttributeError: 'ScriptModule' object has no attribute 'forward'

Apologize if I’ve miss anything else.

Thanks.

Regard,
CL

1 Like

You probably use torch::jit::load, but should torch::load

1 Like

Hi, thanks for your reply. However, as mentioned in my post, when I use torch::load, I get the error :

Error loading
RuntimeError: model_impl.pt is a zip archive (did you mean to use torch.jit.load()?)

But this error is from python, not C++

1 Like

Hi, Thanks for pointing this out. I thought the behavior are identical but it seems like not.

I tried the following simple lines:

#include <torch/torch.h>
struct NetImpl : torch::nn::Module {};
TORCH_MODULE(Net);


int main() {
	Net model;	
	torch::load(model, "net.pt");
	auto in = torch::rand({1, 1, 28, 28});
	auto out = model->forward(in);
    std::cout << in << std::endl;
    std::cout << out << std::endl;
return 0;
}	

the compilation gives following error:

error: ‘struct NetImpl’ has no member named ‘forward’
auto out = model->forward(in);

I’ve to define the NetImpl which I used for training on top of the codes to make it works, it seems like the torch::save will only save the parameters but not the network structure? Am I correct?

If so, is there anyway to save everything in C++ so that I could call it directly?

Thanks again.

Hi everyone,
I am facing the same problem. I created simple cnn using sequential implementation (torch::nn::SequentialImpl ) i can actually use model->forward(Sometensor) but it crashes when i save the model and load it again with jit.

The root of the problem is unclear since i am getting an unhandled exception .

I opened an issue here : https://github.com/pytorch/pytorch/issues/25142.

Hi, I saw that you have this problem solve with torch::save and torch::load, could you share your example on this?

I was using the mnist example, after saving the model using torch::save, in another C++ file, I’ve to define the same model on top of the file before I could use the torch::load.

Thanks.

Regards,
Chin Luh

My network is a simple CNN like this :


	struct NetworkImpl : torch::nn::SequentialImpl {
		NetworkImpl() {


			// Network here 

		};

		TORCH_MODULE(Network);

while training in loop i save the model like this :

   for (size_t i = 0; i < options.iterations; ++i) {

   	train(network, *train_loader, optimizer, i + 1, train_size);

   	test(network, *test_loader, test_size);

       /*Save model */
   	torch::save(network, "Path_to_modelSaveFolder\\model.pt");

   }

then i load it :

Network net;
torch::load(net,model_path);

let me know if you still need help.

Hi, thanks for your reply. Are you performing the saving and loading in the same C++ file? or separate files?

Regards,
Chin Luh

@tancl There are a few scenarios that C++/TorchScript serialization supports:

  • Save as C++ model, load using torch::load() in C++
    • Requirement: You need to have the same C++ model class definition available when you use torch::save and torch::load. The easiest way to achieve this is to put the model class definition in a common header file.
  • If you want to be able to debug the model in Python, the suggested way is to define the model in Python (and perform debugging), convert it to TorchScript, and then load the model using torch::jit::load in C++ (for details on this process, see https://pytorch.org/tutorials/advanced/cpp_export.html).

These are scenarios that C++/TorchScript serialization doesn’t support:

  • Save as C++ model using torch::save, load using torch::jit::load in C++
  • Save as C++ model using torch::save, load using torch.load in Python
  • Save as C++ model using torch::save, load using torch.jit.load in Python

Obviously if you save a model you’ll need to load the same one. As @yf225 mentioned , put the model in common header file.

Hi,

Thanks for the details @yf225 and @Ziri_Ziri.

My intention is to integrate libtorch with Scilab, in which the user could define their own model (in any form, different conv layers, etc) and parse in a scilab gateway (C++) and save it into a model which have the information of the network architecture. The model then will be train in next gateway C++ and save it into another model with the trained parameters again. Finally, the trained model will be called in another gateway C++ for inferencing. I am not too sure whether this could be done, as from your explanation, it goes consistence with the documentation, so i was trying my luck if there is any undocumented way or idea on how to do this.

Thanks again.

Regards,
Chin Luh

@tancl Thanks a lot for the use case information and it’s really helpful. I am thinking of two possible options:
Option 1 - Ask the user to define their C++ model outside of Scilab and in a common header file, and then all Scilab gateways can use this header file to find the definition of the C++ model.
Option 2 - Ask the user to define their TorchScript model outside of Scilab and serialize the model, and then the Scilab gateways can load this TorchScript model and run training / inference with it.

Please let me know if any of these two options would work.

Hi, thanks for the prompt reply.

I will have C++ gateways to call the libtorch, compiled and link so that it becomes a native function in Scilab. In user end, they will just call sth like : trained_model = torch_train(data, target, model_arch, configs) .

Option 1 - Correct me if I am wrong but this will require the recompiling the codes every time a model being define in header, so It might be not flexible, unless the model could be parse as the input to the function.

Option 2 - I will explore more of this options, I remember there are some changes in 1.1 and 1.2, will try out this on 1.2 and get back to you on this. My previous test on mnist examples giving error sth like “forward method not define” during runtime, will confirm this again.

Again, thanks for your details reply and useful suggestions.

Regards,
Chin Luh

Hi,

I was trying the option 2 by using the mnist example, my steps are:

  1. Building/Training a model in Python-pyTorch using the python mnist example and save it into torch script using the script compiler method.
# Using the example from https://github.com/pytorch/examples/tree/master/mnist/main.py with following modification
    if (args.save_model):
        my_model = torch.jit.script(model)
        my_model.save("mymodel.pt")

  1. Using the model for inference works fine in C++:
#include <torch/torch.h>
#include <torch/script.h>
#include <iostream>

int main() {
	//Net model;	
	torch::jit::script::Module model;
	std::string module_path = "mymodel.pt";
	model=  torch::jit::load(module_path);
	
	// Create a vector of inputs.
	std::vector<torch::jit::IValue> inputs;
	inputs.push_back(torch::ones({1, 1, 28, 28}));

	// Execute the model and turn its output into a tensor.
	at::Tensor output = model.forward(inputs).toTensor();
	std::cout << output << std::endl;	
    
return 0;
}	
  1. However, I was facing difficulty when I wanted to use the model and train in C++
// Using the example from https://github.com/pytorch/examples/blob/master/cpp/mnist/mnist.cpp, by removing the net definition block on the beginning of the codes, and loading the model previously trained in python by using jit::load:

  //Net model;
  //model.to(device);
 
  torch::jit::script::Module model;
  std::string module_path = "mymodel.pt";
  model = torch::jit::load(module_path);  
  model.to(device);

  auto train_dataset = torch::data::datasets::MNIST(kDataRoot)
                           .map(torch::data::transforms::Normalize<>(0.1307, 0.3081))
                           .map(torch::data::transforms::Stack<>());
  const size_t train_dataset_size = train_dataset.size().value();
  auto train_loader =
      torch::data::make_data_loader<torch::data::samplers::SequentialSampler>(
          std::move(train_dataset), kTrainBatchSize);

  auto test_dataset = torch::data::datasets::MNIST(
                          kDataRoot, torch::data::datasets::MNIST::Mode::kTest)
                          .map(torch::data::transforms::Normalize<>(0.1307, 0.3081))
                          .map(torch::data::transforms::Stack<>());
  const size_t test_dataset_size = test_dataset.size().value();
  auto test_loader = torch::data::make_data_loader(std::move(test_dataset), kTestBatchSize);

  torch::optim::SGD optimizer(model.parameter, torch::optim::SGDOptions(0.01).momentum(0.5));

  for (size_t epoch = 1; epoch <= kNumberOfEpochs; ++epoch) {
    train(epoch, model, device, *train_loader, optimizer, train_dataset_size);
    test(model, device, *test_loader, test_dataset_size);
  }

I get error as below:

 error: ‘struct torch::jit::script::Module’ has no member named ‘parameter’; did you mean ‘set_parameter’?
   torch::optim::SGD optimizer(model.parameter, torch::optim::SGDOptions(0.01).momentum(0.5));

I read the torch::jit and it did mentioned on defining the nn.parameters could save the attributes, however, how to make this possible in the model definition replacing nn.Conv2d and nn.Linear?

Thanks again in advance.

Regards,
Chin Luh