About torch.nn.utils.clip_grad

I can not understand torch.nn.utils.clip_grad correctly. I saw following code.
http://pytorch.org/docs/master/_modules/torch/nn/utils/clip_grad.html#clip_grad_norm

In this function, I think max_norm is maximum norm of each parameter. But it calculates sum of all norms.
Assume if there are two same grad parameters, (3, 4) and (3, 4) which l2 norm are 5. And given max_norm is 5.
I think parameters’ value will be not changed by this func. But it did.

Now, total_norm is 50 ** 0.5 almost equal to 7.07. So updated value is (3*5/7.07, 4*5/7.07)=(2.12, 2.83)
So it depends on number of parameters because of total_norm.
How do I usually use this func and set max_norm?

I found only one example of using this func.

2 Likes

Please reply on this
I have a similar query.

Thanks

1 Like

Hope some answers. :slight_smile:

I found the explanation here doc
“The norm is computed over all gradients together, as if they were concatenated into a single vector.”

1 Like