How to compute magnitude of gradient of each loss function?

Hi,

I have a question about computing the magnitude of gradient by each loss function.

For example, if I use two losses for one deep model,
how can I compute the magnitude of gradient by loss1 and the magnitude of gradient by loss2 in model parameters?:

model = nn.Sequential(
    nn.Linear(1, 10),
    nn.Linear(10, 1)
)
...
output = model(input)
total_loss = loss1(output) + loss2(output)
total_loss.backward()
...

The meaning of the magnitude of gradient is below, and this is the same thing in the snippets of “torch.nn.utils.clip_grad_norm_”.

torch.norm(torch.stack([torch.norm(p.grad.detach(), 2.0) for p in list(model.parameters())]), 2.0)

The short answer is that you cannot easily do this without doing the backward for each separately, i.e.

grads1 = torch.autograd.grad(loss1(output), model.parameters(), retain_graph=True)
grads2 = torch.autograd.grad(loss2(output), model.parameters())

This gives you two lists of gradients, ordered just like the parameters, now you can use these in your formula.

Best regards

Thomas

Thank you for your quick and detailed solution!