How can I train in C++ using a Pytorch torchscript model

ChenyijunAaron · January 20, 2021, 1:32am

I trained a model in the PyTorch, and then saved it to Torchscript format using torch.jit.save.
Now, I want to retrain on this model. I have a question about whether the torchscript model can be used for training.

thanks

ptrblck · January 20, 2021, 9:35am

Yes, you can train your model in libtorch and @krshrimali has published some blog posts with examples in this post.

ChenyijunAaron · January 20, 2021, 12:49pm

Thanks for your reply. I read @krshrimali 's blog. I have a few different questions about how to train the Torchscript model in C++.
I want to use a trained model for fine tuning. I generated the Torchscript model in pytorch. In C++ API, I load the model using torch::jit::load function. And then I want to retrain the model.
In my code:
torch::jit::script::Module m_model = torch::jit::load(m_modulePath);
torch::optim::SGD optimizer(m_model.parameters(), SGDoptions);

When I set up the optimizer, I was told that the first parameter was incorrect.

krshrimali · January 20, 2021, 1:12pm

Thanks @ptrblck for the mention.

Just to understand the error better, @ChenyijunAaron - it will help if you could share the exact error here.

ChenyijunAaron · January 20, 2021, 1:44pm

Thanks for your reply.

I want fine-tune a model. The model is generated using pytorch. And then I load the .pt model in libtorch.
When I’m initializing the constructor of class torch::optim::SGD, the compiler prompts me that the first argument I entered does not match. See the following code:
torch::jit::script::Module m_model = torch::jit::load(m_modulePath);
torch::optim::SGDOptions SGDoptions(m_train_lr);
torch::optim::SGD optimizer(m_model.parameters(), SGDoptions);
**m_model is generated using PyTorch torch::jit::trace &torch::jit::save

But when I use the following code, I get no error. ( Use your blog code)
See the following code:
torch::nn::Linear linear_layer{ 512,2 };
torch::optim::Adam optimizer(linear_layer->parameters(), torch::optim::AdamOptions(1e-3));

Is it because the torchscript model can’t backpropagate in libtorch？ If I want to train the Torchscript model, what should I do.

Thank you so much.

RobinChen · February 5, 2023, 8:19pm

Hi ChenyijunAaron,

Glad to discuss with you here about training or fine-tuning the python saved .pt module in C++ with libtorch. It is caused by the different types of modules: torch::jit::parameter_list and std::vectorat::Tensor.

I also meet this problem and am still debugging it.

How to fix it？ By now I goes to train it with pure C++ and give up importing python. I will also try the import way when free. If you solve it, it would be great to share it here.

Enjoy the coding！

vectorphresh · October 30, 2024, 6:21am

I got a little farther by passing the gradients as a list to the optimizer, following the GitHub example here but I got a bunch of inplace errors when calculating loss.

I too am thinking a pure C++ approach may be the path to moving forward for reinforcement learning applications.

SeaShellAtSeaSide · November 1, 2024, 6:24am

Still no answer to this? is there no easy way to do this? I’m also stuck in this problem. I can’t construct the optimizer using the loaded jit’ed model’s parameters.

vectorphresh · November 2, 2024, 4:55am

I was able to load torchscript GRU model after making some modifications. I was getting a lot of inplace exceptions which I had to address.

On the Python side I avoided any inplace operations. I set inplace=False for nn.ReLU in my layers definition and altered any self assignments (-=, x = x * …) to model layers.

To troubleshoot where the inplace writes were, I traced the model and then printed the output from dump_alias_db() (for ex print(traced_script_model.graph.dump_alias_db()) ). This will show you where the writes are happening in your torchscript model.

On the C++ side you may also need to clone tensors that are being modified. Only the loss tensor needed to be cloned in my case because I believe the call to backward() would cause an inplace operation when computing the gradients. Enabling anomaly detection also helped.

Also, I was trying to reuse hidden state tensor between calls to training step function, which caused an inplace exception until I switched to preserving the floating point values instead, and recreating the tensor each time.

vectorphresh · November 12, 2024, 5:04am

For the optimizer you read out the parameters and then pass them to your optimizer instance

    std::vector<torch::Tensor> params;
    for (const auto& p : torchscript_model.named_parameters()) {
        params.push_back(p.value);
    }

    torch::optim::Adam optimizer(params,     torch::optim::AdamOptions(0.005));
    optimizer.zero_grad();
    loss.backward({}, true);
    optimizer.step();