Hi,
I have a question about computing the magnitude of gradient by each loss function.
For example, if I use two losses for one deep model,
how can I compute the magnitude of gradient by loss1 and the magnitude of gradient by loss2 in model parameters?:
model = nn.Sequential(
nn.Linear(1, 10),
nn.Linear(10, 1)
)
...
output = model(input)
total_loss = loss1(output) + loss2(output)
total_loss.backward()
...
The meaning of the magnitude of gradient is below, and this is the same thing in the snippets of “torch.nn.utils.clip_grad_norm_”.
torch.norm(torch.stack([torch.norm(p.grad.detach(), 2.0) for p in list(model.parameters())]), 2.0)