Will `torch.nn.utils.clip_grad_norm` clip gradient of `loss+L2 penalty` or just `loss`?

zoharli · March 27, 2018, 8:59am

I wonder if torch.nn.utils.clip_grad_norm will clip gradient of loss+L2 penalty or just loss ?

Here is my code:

            optimizer = Adam(model.parameters(),t_lr,weight_decay=options['reg'])
                 ...
                 ...
            optimizer.zero_grad()
            output=model(input_var)
            loss=criterion(output,target_var)
            loss.backward()
            torch.nn.utils.clip_grad_norm(model.parameters(),options['clip_gradient_norm'])
            optimizer.step()

jpeg729 · March 27, 2018, 9:08am

The L2 penalty is applied by the optimizer in optimizer.step, which comes after the clipping step.

So clipping applies to the gradients of the loss without the L2 penalty

zoharli · March 27, 2018, 9:28am

I think you’re right.
So in RNN optimization does clipping over loss + L2 penalty make a big difference to only clipping over loss?
If it does , how should implement the code which can clip over loss + L2 penalty?
Many thanks.

jpeg729 · March 27, 2018, 11:24am

I would remove the weight_decay argument to Adam and explicitly add the L2 penalty to the loss

for p in model.parameters():
    if p is not None:
        loss += options['reg'] * p
loss.backward()
torch.nn.utils.clip_grad_norm(model.parameters(),options['clip_gradient_norm'])
optimizer.step()

zoharli · March 27, 2018, 11:27am

Wonderful solution! Thank you so much.