I have a question for the masters.
Please give me hope.
There are many layers in my model.
I get total_loss with sum of the final output and the output of a specific layer.
I proceed to differentiate total_loss, and of course there is only one optimizer.
MyModel(): def __init__(self): ... many layers ... self.specific_layer() ... many layers def forward(self, x): ... output1 = Layer1(x) output_s = self.specific_layer(output1.detach()) ... outputN = LayerN(LayerN-1(cat(output1, LayerN-2(...), .....))) ... final_output = FinalLayer(outputN) return final_output, output_s train.py model = MyModel() optimizer = Adam(model.parameters(), ...) ... loop: optimizer.zero_grad() final_y_pred, specific_y_pred = model(x) final_loss = criterion(final_y_pred, final_target) specific_loss = criterion(specific_y_pred, specific_target) total_loss = final_loss + specific_loss total_loss .backward() optimizer.step()
The total_loss converges, but the loss of a specific layer oscillates. T.T
I thought of two ways…
The first is to train by applying learning-rate differently (smaller learning-rate to the oscillated specific layer) using “per-parameters-options”.
The second method is to train by applying optimizer separately.
- Remove the specific_layer from MyModel and set the output to output1, final_output.
- Set the separated specific_layer as MyModel2 with output1 as input and return output_s.
- Set the optimizer for MyModel and MyModel2 , respectively.
Which is a better way?
And is there another good way?
Thank you in advance.