(libtorch) Save MNIST c++ example's trained model into a file, and load in from another c++ file to use for prediction?

tancl · July 26, 2019, 6:54am

Hi, I was trying to explore how to train the mnist model in C++, save the model, and having another C++ to load the file and use it as inference system.

I tried the methods in (libtorch) How to save model in MNIST cpp example?,

Using original mnist.cpp, add 3 lines of codes to save the model:

     torch::serialize::OutputArchive output_archive;
     model.save(output_archive);
     output_archive.save_to("model.pt");

Change the Net by NetImpl as suggested, and save the mode with:

     torch::save(model, "model.pt");

with both, I am able to compile and run the code and save the model, however, when I am trying to load the model from another C++ file, I get the error of “‘ScriptModule’ object has no attribute ‘forward’”

To simplified the testing, I tried to load the model in Python:

Load with torch.load

import torch
# load by torch.load
model = torch.load('model.pt')
## Error loading 
## RuntimeError: model_impl.pt is a zip archive (did you mean to use torch.jit.load()?)

Load by torch.jit.load

# Loaded successfully, but...
model = torch.jit.load('model.pt')
model.eval()

# Python output
ScriptModule(
  (conv1): ScriptModule()
  (conv2): ScriptModule()
  (conv2_drop): ScriptModule()
  (fc1): ScriptModule()
  (fc2): ScriptModule()
)

# When try to have a forward pass, I get the following error.
output = model(torch.ones(1, 1, 28, 28))

## Error : AttributeError: 'ScriptModule' object has no attribute 'forward'

Apologize if I’ve miss anything else.

Thanks.

Regard,
CL

soldierofhell · July 27, 2019, 10:46am

You probably use torch::jit::load, but should torch::load

tancl · July 27, 2019, 2:46pm

Hi, thanks for your reply. However, as mentioned in my post, when I use torch::load, I get the error :

Error loading
RuntimeError: model_impl.pt is a zip archive (did you mean to use torch.jit.load()?)

soldierofhell · July 28, 2019, 9:49am

But this error is from python, not C++

tancl · July 29, 2019, 2:36am

Hi, Thanks for pointing this out. I thought the behavior are identical but it seems like not.

I tried the following simple lines:

#include <torch/torch.h>
struct NetImpl : torch::nn::Module {};
TORCH_MODULE(Net);


int main() {
	Net model;	
	torch::load(model, "net.pt");
	auto in = torch::rand({1, 1, 28, 28});
	auto out = model->forward(in);
    std::cout << in << std::endl;
    std::cout << out << std::endl;
return 0;
}

the compilation gives following error:

error: ‘struct NetImpl’ has no member named ‘forward’
auto out = model->forward(in);

I’ve to define the NetImpl which I used for training on top of the codes to make it works, it seems like the torch::save will only save the parameters but not the network structure? Am I correct?

If so, is there anyway to save everything in C++ so that I could call it directly?

Thanks again.

Ziri_Ziri · August 24, 2019, 11:03am

Hi everyone,
I am facing the same problem. I created simple cnn using sequential implementation (torch::nn::SequentialImpl ) i can actually use model->forward(Sometensor) but it crashes when i save the model and load it again with jit.

The root of the problem is unclear since i am getting an unhandled exception .

I opened an issue here : https://github.com/pytorch/pytorch/issues/25142.

tancl · August 28, 2019, 7:19am

Hi, I saw that you have this problem solve with torch::save and torch::load, could you share your example on this?

I was using the mnist example, after saving the model using torch::save, in another C++ file, I’ve to define the same model on top of the file before I could use the torch::load.

Thanks.

Regards,
Chin Luh

Ziri_Ziri · September 1, 2019, 3:27am

My network is a simple CNN like this :


	struct NetworkImpl : torch::nn::SequentialImpl {
		NetworkImpl() {


			// Network here 

		};

		TORCH_MODULE(Network);

while training in loop i save the model like this :

   for (size_t i = 0; i < options.iterations; ++i) {

   	train(network, *train_loader, optimizer, i + 1, train_size);

   	test(network, *test_loader, test_size);

       /*Save model */
   	torch::save(network, "Path_to_modelSaveFolder\\model.pt");

   }

then i load it :

Network net;
torch::load(net,model_path);

let me know if you still need help.

tancl · September 3, 2019, 7:38am

Hi, thanks for your reply. Are you performing the saving and loading in the same C++ file? or separate files?

Regards,
Chin Luh

yf225 · September 3, 2019, 4:05pm

@tancl There are a few scenarios that C++/TorchScript serialization supports:

Save as C++ model, load using torch::load() in C++
- Requirement: You need to have the same C++ model class definition available when you use torch::save and torch::load. The easiest way to achieve this is to put the model class definition in a common header file.
If you want to be able to debug the model in Python, the suggested way is to define the model in Python (and perform debugging), convert it to TorchScript, and then load the model using torch::jit::load in C++ (for details on this process, see https://pytorch.org/tutorials/advanced/cpp_export.html).

These are scenarios that C++/TorchScript serialization doesn’t support:

Save as C++ model using torch::save, load using torch::jit::load in C++
Save as C++ model using torch::save, load using torch.load in Python
Save as C++ model using torch::save, load using torch.jit.load in Python

Ziri_Ziri · September 4, 2019, 9:34am

Obviously if you save a model you’ll need to load the same one. As @yf225 mentioned , put the model in common header file.

tancl · September 4, 2019, 9:45am

Hi,

Thanks for the details @yf225 and @Ziri_Ziri.

My intention is to integrate libtorch with Scilab, in which the user could define their own model (in any form, different conv layers, etc) and parse in a scilab gateway (C++) and save it into a model which have the information of the network architecture. The model then will be train in next gateway C++ and save it into another model with the trained parameters again. Finally, the trained model will be called in another gateway C++ for inferencing. I am not too sure whether this could be done, as from your explanation, it goes consistence with the documentation, so i was trying my luck if there is any undocumented way or idea on how to do this.

Thanks again.

Regards,
Chin Luh

yf225 · September 4, 2019, 2:55pm

@tancl Thanks a lot for the use case information and it’s really helpful. I am thinking of two possible options:
Option 1 - Ask the user to define their C++ model outside of Scilab and in a common header file, and then all Scilab gateways can use this header file to find the definition of the C++ model.
Option 2 - Ask the user to define their TorchScript model outside of Scilab and serialize the model, and then the Scilab gateways can load this TorchScript model and run training / inference with it.

Please let me know if any of these two options would work.

tancl · September 5, 2019, 6:02am

Hi, thanks for the prompt reply.

I will have C++ gateways to call the libtorch, compiled and link so that it becomes a native function in Scilab. In user end, they will just call sth like : trained_model = torch_train(data, target, model_arch, configs) .

Option 1 - Correct me if I am wrong but this will require the recompiling the codes every time a model being define in header, so It might be not flexible, unless the model could be parse as the input to the function.

Option 2 - I will explore more of this options, I remember there are some changes in 1.1 and 1.2, will try out this on 1.2 and get back to you on this. My previous test on mnist examples giving error sth like “forward method not define” during runtime, will confirm this again.

Again, thanks for your details reply and useful suggestions.

Regards,
Chin Luh

tancl · September 9, 2019, 6:32am

Hi,

I was trying the option 2 by using the mnist example, my steps are:

Building/Training a model in Python-pyTorch using the python mnist example and save it into torch script using the script compiler method.

# Using the example from https://github.com/pytorch/examples/tree/master/mnist/main.py with following modification
    if (args.save_model):
        my_model = torch.jit.script(model)
        my_model.save("mymodel.pt")

Using the model for inference works fine in C++:

#include <torch/torch.h>
#include <torch/script.h>
#include <iostream>

int main() {
	//Net model;	
	torch::jit::script::Module model;
	std::string module_path = "mymodel.pt";
	model=  torch::jit::load(module_path);
	
	// Create a vector of inputs.
	std::vector<torch::jit::IValue> inputs;
	inputs.push_back(torch::ones({1, 1, 28, 28}));

	// Execute the model and turn its output into a tensor.
	at::Tensor output = model.forward(inputs).toTensor();
	std::cout << output << std::endl;	
    
return 0;
}

However, I was facing difficulty when I wanted to use the model and train in C++

// Using the example from https://github.com/pytorch/examples/blob/master/cpp/mnist/mnist.cpp, by removing the net definition block on the beginning of the codes, and loading the model previously trained in python by using jit::load:

  //Net model;
  //model.to(device);
 
  torch::jit::script::Module model;
  std::string module_path = "mymodel.pt";
  model = torch::jit::load(module_path);  
  model.to(device);

  auto train_dataset = torch::data::datasets::MNIST(kDataRoot)
                           .map(torch::data::transforms::Normalize<>(0.1307, 0.3081))
                           .map(torch::data::transforms::Stack<>());
  const size_t train_dataset_size = train_dataset.size().value();
  auto train_loader =
      torch::data::make_data_loader<torch::data::samplers::SequentialSampler>(
          std::move(train_dataset), kTrainBatchSize);

  auto test_dataset = torch::data::datasets::MNIST(
                          kDataRoot, torch::data::datasets::MNIST::Mode::kTest)
                          .map(torch::data::transforms::Normalize<>(0.1307, 0.3081))
                          .map(torch::data::transforms::Stack<>());
  const size_t test_dataset_size = test_dataset.size().value();
  auto test_loader = torch::data::make_data_loader(std::move(test_dataset), kTestBatchSize);

  torch::optim::SGD optimizer(model.parameter, torch::optim::SGDOptions(0.01).momentum(0.5));

  for (size_t epoch = 1; epoch <= kNumberOfEpochs; ++epoch) {
    train(epoch, model, device, *train_loader, optimizer, train_dataset_size);
    test(model, device, *test_loader, test_dataset_size);
  }

I get error as below:

 error: ‘struct torch::jit::script::Module’ has no member named ‘parameter’; did you mean ‘set_parameter’?
   torch::optim::SGD optimizer(model.parameter, torch::optim::SGDOptions(0.01).momentum(0.5));

I read the torch::jit and it did mentioned on defining the nn.parameters could save the attributes, however, how to make this possible in the model definition replacing nn.Conv2d and nn.Linear?

Thanks again in advance.

Regards,
Chin Luh

Edwardmark · April 14, 2020, 5:52am

How to save sth in python-api pytorch, but load it in libtorch?

How can I save some tensor in python, but load it in libtorch:

I save tensor named piror using python, using the code:

torch.save(prior,  'prior.pth')

And I load the tensor in libtorch using C++, by the following code:

std::vector<torch::Tensor> tensorVec;
torch::load(tensorVec, "/app/model/prior.pth");
torch::Tensor priors = tensorVec[0];

But I got the error:
terminate called after throwing an instance of ‘c10::Error’
what(): torch::jit::load() received a file from torch.save(), but torch::jit::load() can only load files produced by torch.jit.save() (load at …/torch/csrc/jit/serialization/import.cpp:285)

Why is that? And what should I do to solve the issue? Thanks in advance.

@yf225

MakGulati · December 25, 2021, 10:09pm

@yf225 I have similar doubt : basically how can i use pytorch trained model subsequently in torchlib for fine-tuning on c++ based device given that i have defined the model first in c++ which is obviously same as pytorch model?