Loss.backward() if multiple losses from multiple networks are present

Daksh_Maheshwari · February 10, 2023, 6:07am

I have a model which can be divided into two submodels.
Model1 takes input, then k number of Model2 takes input from Model1.
So, I have k number losses.
Currently, I’m doing loss = loss_1 + … + loss_k and then doing loss.backward().
My query is, how loss.backward will make sure that ith Model2 gradient calculation happens by loss_i, not summed up loss.

srishti-git1110 · February 10, 2023, 6:20am

Hi Daksh, on calling loss.backward(), the grad attribute of all the leaf tensors in the computation graph of loss is populated.
The answer to your question can be best answered if you could provide some code, please.

eqy · February 10, 2023, 6:21am

If the ith model contributes to the only ith loss value, the derivative of the other components of the loss with respect to that model are zero so it would happen automatically.

Another way of viewing this is that by the chain rule, loss.backward() when loss is the sum of the other losses is equivalent to calling loss_i.backward() for each loss individually.

Note that model1 would be affected by all losses as it contributes to each term by virtue of producing one output that is used by the downstream models.

import torch

torch.manual_seed(0)
m1 = torch.nn.Conv2d(1,1,1,1)
m2 = torch.nn.Conv2d(1,2,1,1)
inp = torch.randn(1,1,1,1)
loss1 = m1(inp).prod()
loss2 = m2(inp).prod()
loss = loss1 + loss2
loss.backward()
print(m1.weight.grad, m2.weight.grad)

torch.manual_seed(0)
m1 = torch.nn.Conv2d(1,1,1,1)
m2 = torch.nn.Conv2d(1,2,1,1)
inp = torch.randn(1,1,1,1)
loss1 = m1(inp).prod()
loss2 = m2(inp).prod()
loss1.backward()
loss2.backward()
print(m1.weight.grad, m2.weight.grad)

$ python3 multiple.py
tensor([[[[1.2645]]]]) tensor([[[[-0.8376]]],


        [[[-1.8029]]]])
tensor([[[[1.2645]]]]) tensor([[[[-0.8376]]],


        [[[-1.8029]]]])