The cutoff threshold for gradient clipping is set based on the average norm of the gradient over one pass on the data. I would therefore like to compute the average norm of the gradient to find a fitting gradient clipping value for my model. How can this be done in PyTorch?

Another quick question: I have seen the following in the language modeling example:

```
# `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
torch.nn.utils.clip_grad_norm(model.parameters(), args.clip)
for p in model.parameters():
p.data.add_(-lr, p.grad.data)
```

If `clip_grad_norm`

is already applied to `model.parameters()`

, why we need the for loop?