How to compute magnitude of gradient of each loss function?


I have a question about computing the magnitude of gradient by each loss function.

For example, if I use two losses for one deep model,
how can I compute the magnitude of gradient by loss1 and the magnitude of gradient by loss2 in model parameters?:

model = nn.Sequential(
    nn.Linear(1, 10),
    nn.Linear(10, 1)
output = model(input)
total_loss = loss1(output) + loss2(output)

The meaning of the magnitude of gradient is below, and this is the same thing in the snippets of “torch.nn.utils.clip_grad_norm_”.

torch.norm(torch.stack([torch.norm(p.grad.detach(), 2.0) for p in list(model.parameters())]), 2.0)

The short answer is that you cannot easily do this without doing the backward for each separately, i.e.

grads1 = torch.autograd.grad(loss1(output), model.parameters(), retain_graph=True)
grads2 = torch.autograd.grad(loss2(output), model.parameters())

This gives you two lists of gradients, ordered just like the parameters, now you can use these in your formula.

Best regards


Thank you for your quick and detailed solution!