I’m studying how to use weight_norm in pytorch. But, I found a weird problem.
this is my code:
W = weight_norm(nn.Linear(5,3, bias = False))
x = torch.ones(5)
d = torch.sum(W(x))
d.backward()
W.weight_v.grad
After doing that, W.weight_v.grad should all be one, but, the gradient is so strange:
W.weight_v.grad
tensor([[0.2237, 1.4274, 0.7918, 0.8525, 0.1773],
[0.4810, 1.4245, 0.7913, 0.4601, 1.0554],
[0.9531, 0.8751, 0.9691, 1.0501, 1.1175]])
And another weird thing happened:
W.weight_g.grad
tensor([[-1.2358],
[-0.8875],
[-0.1872]])
I set bias = False, why W.weight_g have gradient??