Why the gradient is so strange after weight_norm?

Andrew_Chen · March 20, 2022, 2:39am

I’m studying how to use weight_norm in pytorch. But, I found a weird problem.

this is my code:

W = weight_norm(nn.Linear(5,3, bias = False))
x  = torch.ones(5)
d = torch.sum(W(x))
d.backward()
W.weight_v.grad

After doing that, W.weight_v.grad should all be one, but, the gradient is so strange:

W.weight_v.grad
tensor([[0.2237, 1.4274, 0.7918, 0.8525, 0.1773],
        [0.4810, 1.4245, 0.7913, 0.4601, 1.0554],
        [0.9531, 0.8751, 0.9691, 1.0501, 1.1175]])

And another weird thing happened:

W.weight_g.grad
tensor([[-1.2358],
        [-0.8875],
        [-0.1872]])

I set bias = False, why W.weight_g have gradient??

Andrew_Chen · March 20, 2022, 2:43am

I’m so sorry… I should read the document carefully before asking question…

Weight normalization is a reparameterization that decouples the magnitude of a weight tensor from its direction. This replaces the parameter specified by name (e.g. 'weight' ) with two parameters: one specifying the magnitude (e.g. 'weight_g' ) and one specifying the direction (e.g. 'weight_v' ).

weight_g is not bias, weight_v is not weight matrix!!!