Will `torch.nn.utils.clip_grad_norm` clip gradient of `loss+L2 penalty` or just `loss`?

I wonder if torch.nn.utils.clip_grad_norm will clip gradient of loss+L2 penalty or just loss ?

Here is my code:

            optimizer = Adam(model.parameters(),t_lr,weight_decay=options['reg'])
                 ...
                 ...
            optimizer.zero_grad()
            output=model(input_var)
            loss=criterion(output,target_var)
            loss.backward()
            torch.nn.utils.clip_grad_norm(model.parameters(),options['clip_gradient_norm'])
            optimizer.step()
1 Like

The L2 penalty is applied by the optimizer in optimizer.step, which comes after the clipping step.

So clipping applies to the gradients of the loss without the L2 penalty

1 Like

I think you’re right.
So in RNN optimization does clipping over loss + L2 penalty make a big difference to only clipping over loss?
If it does , how should implement the code which can clip over loss + L2 penalty?
Many thanks.

I would remove the weight_decay argument to Adam and explicitly add the L2 penalty to the loss

for p in model.parameters():
    if p is not None:
        loss += options['reg'] * p
loss.backward()
torch.nn.utils.clip_grad_norm(model.parameters(),options['clip_gradient_norm'])
optimizer.step()
1 Like

Wonderful solution! Thank you so much.

1 Like