I am currently working on a model that consists of 2 parts a frontend and a backend.
Now I am supposed to train this model in stages ie first frontend, then backend, and finally the entire model on an end-to-end basis.
My question is that in the first and second stage which of the following 2 options should I choose:
- Set grad enabled true for the entire model but take 2 learning rates. The part to be trained(say frontend) will have a non-zero learning rate and the other(say backend) a learning rate of 0.
- Set a single non-zero learning rate but make grad_enabled true for only the part that is to be trained(say frontend) and make sure that backend has lr = 0.