import torch.nn.functional as F
from torch.autograd import Variable
import torch
a = Variable(torch.ones(1, 4), requires_grad=True)
norm = F.normalize(a, p=2, dim=1)
y = norm.dot(norm)
y.backward()
print(a.grad)
And I got this result:
Variable containing:
0 0 0 0
[torch.FloatTensor of size 1x4]
It seems the gradient of F.normalize() is zero. When I change the line norm = F.normalize(a, p=2, dim=1) to
Thank you for your replay!
But I still cannot understand what you mean by “it has 0 grad as it is symmetrical”. As far as I know, Normalize is a differential operation, and we can compute its gradient by Chain Rule.
And I just read the source code of F.normalize(), which is quite similar with my second code except the “detach()” operation.
Yes, I know that the grad is defined. I was just saying that it will be 0. Symmetric is probably not the best word. What I meant is no matter what a is y is always 1, so a's grad is always 0.