I trained a model in the PyTorch, and then saved it to Torchscript format using torch.jit.save.
Now, I want to retrain on this model. I have a question about whether the torchscript model can be used for training.
thanks
I trained a model in the PyTorch, and then saved it to Torchscript format using torch.jit.save.
Now, I want to retrain on this model. I have a question about whether the torchscript model can be used for training.
thanks
Yes, you can train your model in libtorch
and @krshrimali has published some blog posts with examples in this post.
Thanks for your reply. I read @krshrimali 's blog. I have a few different questions about how to train the Torchscript model in C++.
I want to use a trained model for fine tuning. I generated the Torchscript model in pytorch. In C++ API, I load the model using torch::jit::load function. And then I want to retrain the model.
In my code:
torch::jit::script::Module m_model = torch::jit::load(m_modulePath);
torch::optim::SGD optimizer(m_model.parameters(), SGDoptions);
When I set up the optimizer, I was told that the first parameter was incorrect.
Thanks @ptrblck for the mention.
Just to understand the error better, @ChenyijunAaron - it will help if you could share the exact error here.
Thanks for your reply.
I want fine-tune a model. The model is generated using pytorch. And then I load the .pt model in libtorch.
When I’m initializing the constructor of class torch::optim::SGD, the compiler prompts me that the first argument I entered does not match. See the following code:
torch::jit::script::Module m_model = torch::jit::load(m_modulePath);
torch::optim::SGDOptions SGDoptions(m_train_lr);
torch::optim::SGD optimizer(m_model.parameters(), SGDoptions);
**m_model is generated using PyTorch torch::jit::trace &torch::jit::save
But when I use the following code, I get no error. ( Use your blog code)
See the following code:
torch::nn::Linear linear_layer{ 512,2 };
torch::optim::Adam optimizer(linear_layer->parameters(), torch::optim::AdamOptions(1e-3));
Is it because the torchscript model can’t backpropagate in libtorch? If I want to train the Torchscript model, what should I do.
Thank you so much.
Hi ChenyijunAaron,
Glad to discuss with you here about training or fine-tuning the python saved .pt module in C++ with libtorch. It is caused by the different types of modules: torch::jit::parameter_list and std::vectorat::Tensor.
I also meet this problem and am still debugging it.
How to fix it? By now I goes to train it with pure C++ and give up importing python. I will also try the import way when free. If you solve it, it would be great to share it here.
Enjoy the coding!
I got a little farther by passing the gradients as a list to the optimizer, following the GitHub example here but I got a bunch of inplace errors when calculating loss.
I too am thinking a pure C++ approach may be the path to moving forward for reinforcement learning applications.
Still no answer to this? is there no easy way to do this? I’m also stuck in this problem. I can’t construct the optimizer using the loaded jit’ed model’s parameters.
I was able to load torchscript GRU model after making some modifications. I was getting a lot of inplace exceptions which I had to address.
On the Python side I avoided any inplace operations. I set inplace=False for nn.ReLU in my layers definition and altered any self assignments (-=, x = x * …) to model layers.
To troubleshoot where the inplace writes were, I traced the model and then printed the output from dump_alias_db() (for ex print(traced_script_model.graph.dump_alias_db()) ). This will show you where the writes are happening in your torchscript model.
On the C++ side you may also need to clone tensors that are being modified. Only the loss tensor needed to be cloned in my case because I believe the call to backward() would cause an inplace operation when computing the gradients. Enabling anomaly detection also helped.
Also, I was trying to reuse hidden state tensor between calls to training step function, which caused an inplace exception until I switched to preserving the floating point values instead, and recreating the tensor each time.
For the optimizer you read out the parameters and then pass them to your optimizer instance
std::vector<torch::Tensor> params;
for (const auto& p : torchscript_model.named_parameters()) {
params.push_back(p.value);
}
torch::optim::Adam optimizer(params, torch::optim::AdamOptions(0.005));
optimizer.zero_grad();
loss.backward({}, true);
optimizer.step();