I have a question about computing the magnitude of gradient by each loss function.
For example, if I use two losses for one deep model,
how can I compute the magnitude of gradient by loss1 and the magnitude of gradient by loss2 in model parameters?:
model = nn.Sequential( nn.Linear(1, 10), nn.Linear(10, 1) ) ... output = model(input) total_loss = loss1(output) + loss2(output) total_loss.backward() ...
The meaning of the magnitude of gradient is below, and this is the same thing in the snippets of “torch.nn.utils.clip_grad_norm_”.
torch.norm(torch.stack([torch.norm(p.grad.detach(), 2.0) for p in list(model.parameters())]), 2.0)