How to compute average norm of the gradient

Samrat_Hasan · April 8, 2017, 2:31am

The cutoff threshold for gradient clipping is set based on the average norm of the gradient over one pass on the data. I would therefore like to compute the average norm of the gradient to find a fitting gradient clipping value for my model. How can this be done in PyTorch?

Another quick question: I have seen the following in the language modeling example:

# `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
torch.nn.utils.clip_grad_norm(model.parameters(), args.clip)
for p in model.parameters():
    p.data.add_(-lr, p.grad.data)

If clip_grad_norm is already applied to model.parameters(), why we need the for loop?

pranav · April 8, 2017, 2:33pm

The for loop is for the gradient descent update which is manually implemented in the example. Parameters are reduced by their gradient times learning rate.

To your first question, if you are referring to Pascanu et al. clipping which is based on the norm of the gradient, then torch.nn.utils.clip_grad_norm does that for you. The clipping threshold is usually tuned as a hyperparameter as there is no way to determine what the norm of the gradients would be through the training.

Samrat_Hasan · April 14, 2017, 12:04am

@smth is it possible in future release of pytorch to add some functionality to check the gradient norm? It will be very helpful.

smth · April 15, 2017, 4:40pm

you can check the gradient norm using hooks.

Samrat_Hasan · April 21, 2017, 2:55am

Can you please elaborate your answer?