I can not understand `torch.nn.utils.clip_grad` correctly. I saw following code.
http://pytorch.org/docs/master/_modules/torch/nn/utils/clip_grad.html#clip_grad_norm

In this function, I think `max_norm` is maximum norm of each parameter. But it calculates sum of all norms.
Assume if there are two same grad parameters, `(3, 4)` and `(3, 4)` which l2 norm are `5`. And given `max_norm` is `5`.
I think parameters’ value will be not changed by this func. But it did.

Now, `total_norm` is `50 ** 0.5` almost equal to 7.07. So updated value is `(3*5/7.07, 4*5/7.07)=(2.12, 2.83)`
So it depends on number of parameters because of `total_norm`.
How do I usually use this func and set `max_norm`?

I found only one example of using this func.

I found the explanation here doc
“The norm is computed over all gradients together, as if they were concatenated into a single vector.”

