at the moment Im trying to implement a CVAE which I want to retrain after it learned on reference. For that retraining part I want to freeze every part of the network except the weights in the first encoder layer that are responsible for the conditions that are represented in the new data.
What Im doing is to use the requires_grad flag and set it to false for every layer except the first encoder layer before retraining. On top of that I set the gradient to 0 in the trainer during training like this:
for name, p in self.model.named_parameters():
p.grad[:, :-self.model.conditions] = 0
Im using a Adam optimizer for this model. But sadly the results arent really good. Does this way of freezing the weights disturb the optimizer? Or is there a better option to partially freeze a layer in pytorch?
I would be glad about any help