In this function, I think max_norm is maximum norm of each parameter. But it calculates sum of all norms.
Assume if there are two same grad parameters, (3, 4) and (3, 4) which l2 norm are 5. And given max_norm is 5.
I think parameters’ value will be not changed by this func. But it did.
Now, total_norm is 50 ** 0.5 almost equal to 7.07. So updated value is (3*5/7.07, 4*5/7.07)=(2.12, 2.83)
So it depends on number of parameters because of total_norm.
How do I usually use this func and set max_norm?