# Multi loss optimizer

Hi there.

I have a question for the masters.

There are many layers in my model.
I get total_loss with sum of the final output and the output of a specific layer.
I proceed to differentiate total_loss, and of course there is only one optimizer.

``````MyModel():
def __init__(self):
...
many layers
...
self.specific_layer()
...
many layers

def forward(self, x):
...
output1 = Layer1(x)
output_s = self.specific_layer(output1.detach())
...
outputN = LayerN(LayerN-1(cat(output1, LayerN-2(...), .....)))
...
final_output = FinalLayer(outputN)

return final_output, output_s

train.py
model = MyModel()
...
loop:

final_y_pred, specific_y_pred = model(x)

final_loss = criterion(final_y_pred, final_target)
specific_loss = criterion(specific_y_pred, specific_target)
total_loss = final_loss + specific_loss

total_loss .backward()
optimizer.step()
``````

The total_loss converges, but the loss of a specific layer oscillates. T.T

I thought of two ways…

The first is to train by applying learning-rate differently (smaller learning-rate to the oscillated specific layer) using “per-parameters-options”.

The second method is to train by applying optimizer separately.

1. Remove the specific_layer from MyModel and set the output to output1, final_output.
2. Set the separated specific_layer as MyModel2 with output1 as input and return output_s.
3. Set the optimizer for MyModel and MyModel2 , respectively.

Which is a better way?
And is there another good way?

Based on the posted pseudo code it seems that `specific_loss` would create the gradients for `self.specifc_layer` only (`layer1` and previous layers are detached and no other layers are used to create `output_s`), while `final_output` might potentially use many more layers (thus also parameters).
If so, it could be easier for the model to drive `final_loss` down as its optimizer would potentially update many more parameters, but that’s just my guess.

Hi @ptrblck
I’m unsure if it would work, but you could try to scale up the `specific_loss` to try to force the model to focus more on it.