# Backprop min loss of losses from two models to both models

I have two models(`model1` and `model2`) running in coupled manner

``````output1 = model1(input)
output2 = model2(input)
loss1= loss_fn(output1)
loss2= loss_fn(output2)
loss = min(loss1, loss2)
``````

How do I propogate this `loss` to both `model1` and `model2` ?

NOTE: I understand

``````loss = loss1 + loss2
loss.backward()
``````

the above could work but I need the `min(loss1, loss2)` to be back prop to both models.

`torch.min` acts like a switch, so I’m not sure how both models should get valid gradients.

Yea I understand if `min(loss1, loss2)` is `loss1`, `loss.backward()` will only backprop through `model1` and vice versa. Please suggest a better or alternative way to solve this. I need to backprop min of `loss1` and `loss2` through both models.

Hi Alwyn!

In order for the gradients to flow through both branches of your
“min” function, you need a smooth version of min that transitions

LogSumExp is a smooth version of the max function, and is
implemented in pytorch as torch.logsumexp(). You can turn it into
a smooth minimum by adding minus signs:

`smooth_min = -torch.logsumexp (-arg, dim)`

You can make it transition more or less abruptly by adding a
parameter `q`:

`-torch.logsumexp (-q * arg, dim) / q`

Note, if `loss1` is much larger than `loss2` (or if you transition
abruptly), then the gradients will flow only weakly back through
`model2` (which may well be what you want).

Good luck.

K. Frank

[Edit: However, upon further consideration, this scheme might
not do what you want. Let’s say, that by happenstance, `model2`
starts doing better than `model1`. Then gradients will flow more
weakly through `model1`, so it will learn more slowly. So `model2`
will get better still while `model1` stagnates.

If `model2` gets sufficiently better than `model1`, then `model1`
will effectively drop out of the picture, regards of whether, given
the chance and adequate training, it might have ended up doing
better than `model2`.]

1 Like