The cutoff threshold for gradient clipping is set based on the average norm of the gradient over one pass on the data. I would therefore like to compute the average norm of the gradient to find a fitting gradient clipping value for my model. How can this be done in PyTorch?
Another quick question: I have seen the following in the language modeling example:
# `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
for p in model.parameters():
clip_grad_norm is already applied to
model.parameters(), why we need the for loop?