Learning rate setup when model is not being trained end-to-end

I am currently working on a model that consists of 2 parts a frontend and a backend.
Now I am supposed to train this model in stages ie first frontend, then backend, and finally the entire model on an end-to-end basis.
My question is that in the first and second stage which of the following 2 options should I choose:

  1. Set grad enabled true for the entire model but take 2 learning rates. The part to be trained(say frontend) will have a non-zero learning rate and the other(say backend) a learning rate of 0.
  2. Set a single non-zero learning rate but make grad_enabled true for only the part that is to be trained(say frontend) and make sure that backend has lr = 0.

I would suggest to use the second approach and not even creating an optimizer for the backend.
Once you have finished training the frontend, you could create the optimizer for the backend and set requires_grad=False for the frontend.

1 Like