Hello All,
In Weight Normalization code, We remove weight from the list of Parameters and add two new parameters v and g. But while calculating the Gradient during the Backward process, we need the Gradient wrt to weight, while calculating gradient wrt to v and g as mentioned in the paper.
But If we remove weight from list of parameter, how would be have the gradient information wrt to weight? Could some one help me out, understand this
Thanks!