Finetuning a model on multiple GPU in C++

In brief, my question is:
How to resize a given (registered) module and reinitialize it with random values ?

My process:
I have a TORCH_MODULE(Net) c++ class with a NetImpl deriving from torch::nn::Cloneable and a net_imagenet.pt archive (1000 classes)
I want to finetune this model on a different number of classes N.

torch::nn::Net net(N);   // create a Net with N classes last_linear.sizes()={nbFeatures, N}, last_linear.bias.sizes() = {N}
torch::load(net, "net_imagenet.pt");   // load pretrained parameters (1000 classes), last_linear.sizes()={nbFeatures, N}, last_linear.bias.sizes() = {1000}

At this stage, the last linear module is of size {nbFeatures, N}, but its bias and weights are of size 1000.
As the last_linear module is registered in Net::reset() (to be cloneable), I can’t find a way to replace it with a new nn::Linear as you do in Python.
I’ve tried various hack to reset the last_linear, but found none which works on multiple gpu:

  • changing last_linear as in python does no good, the original version is still used
  • renaming last_linear in register_module (ie “last_linear” to “last_linear_ft”) layer before loading the weights fails at torch::load() with a c10 error No such serialized submodule: ‘last_linear_ft’
  • registering the new version of last_linear as “last_linear_ft” after loading the weigths is ugly (memory consumption) but works in single gpu mode. Although clone() fails : the number of module changed outside of the constructor…

I definitely need a reregister_module() or something like that…

Does anyone has successfully finetuned a model with the C++ API ?

Indeed, replace_module seems to be missing, I’ll see if I can put up a PR and link it here.

Best regards

Thomas

Thanks for you answer !

So I filed #22546, we’ll see if it is generally liked.

Best regards

Thomas