The cutoff threshold for gradient clipping is set based on the average norm of the gradient over one pass on the data. I would therefore like to compute the average norm of the gradient to find a fitting gradient clipping value for my model. How can this be done in PyTorch?
Another quick question: I have seen the following in the language modeling example:
# `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs. torch.nn.utils.clip_grad_norm(model.parameters(), args.clip) for p in model.parameters(): p.data.add_(-lr, p.grad.data)
clip_grad_norm is already applied to
model.parameters(), why we need the for loop?