Calculation of Gradient wrt to v and g in WeightNormalization

Hello All,

In Weight Normalization code, We remove weight from the list of Parameters and add two new parameters v and g. But while calculating the Gradient during the Backward process, we need the Gradient wrt to weight, while calculating gradient wrt to v and g as mentioned in the paper.

But If we remove weight from list of parameter, how would be have the gradient information wrt to weight? Could some one help me out, understand this

Thanks!