Multi-Output Model

I have a multi-output model in PyTorch when I train them using the same loss and then to backpropagate I combine the loss of both the output but when one output loss decreases others increase and so on. How can I fix the problem?

def forward(self, x):
    #neural network arch. forward pass
    x = dense1(x)
    x1 = dense2(x)
    x2 = dense2(x)
    x1 = F.log_softmax(x1) 
    x2 = F.log_softmax(x2)
    return x1, x2

out1, out2 = model(data)
loss1 = NLLL(out1, target1)
loss2 = NLLL(out2, target2)
loss = loss1 + loss2
loss.backward()

When loss1 decrease loss2 increase and when loss2 decrease loss1 increase how can i fix the issue.
Can any other operator other than ‘+’ be used to combine the loss or should I put weights to the different loss?

I’m not sure that what you have here is a multi-output model, per se. Instead, you have a single output model where you would like the output to be as close as possible to two targets as possible.

You have one input x, and you pass it through the same network layers (with identical weights) to produce x1 and x2, which are identical. You then compare this single value (albeit with two different names) to the targets target1 and target2 to calculate loss1 and loss2. Thus, your network is searching for the single set of weights for dense1 and dense2 that will produce a single output value with the lowest loss value on average. It makes sense that when one loss value decreases, the other increases, and vice versa, because if target1 and target2 are different, the network will only be able to predict one at a time (again, because the values x1 and x2 are identical).

Assuming that what you actually want is a network that predicts both targets from a given input:

def forward(self, x):
#neural network arch. forward pass
x = dense1(x)
x1 = dense2a(x)
x2 = dense2b(x)
x1 = F.log_softmax(x1)
x2 = F.log_softmax(x2)
return x1, x2

for some_iterator_over_data:
out1, out2 = model(data)
loss1 = NLLL(out1, target1)
loss2 = NLLL(out2, target2)
loss = loss1+loss2
loss.backward()
optimizer.step() # assuming the target for your optimizer is loss
optimizer.zero_grad()

In this way, different weights will be learned for layers dense2a and dense2b which make them more suitable for predicting target1 and target2, resepectively.

Hope this is what you were looking for!

Thanks for the suggestion. It was helpful i would also like to know weather there is a different way to combine the loss such that bot increase at the same time.
Like (x1^2+x2^2)^1/2

Ok, it’s important to understand what the loss actually is here so that you can achieve the results you want. Loss is just a number, calculated as some function on the outputs and targets of your network. Different functions give different representations of the error your network is making, but the lowest possible error you can achieve is 0. Thus, there is no need to sum the squares of these terms (this is generally done when you expect some negative values.

Now, to get to your main question. You would like to be able to constrain the network so that it cannot exploit just one of the loss functions. If you find that one portion of the loss is getting optimized much more than another, you can weight them with coefficients to achieve desired training properties.

total_loss = loss1 + 10* loss2

Alternatively, you could multiply these two components together or perform any number of other operations on them. For example, in the following formula, the total_loss is increased in proportion to the difference in magnitude of loss1 and loss2, helping to constrain them to be roughly equal.

total_loss = (loss1 + loss2)* (1 + |loss1-loss2|)

Ultimately, the single loss value is all that your network sees and uses for determining the magnitude of updates to its weights, so craft your loss value such that it has the properties you want.