F.normalize() get zero gradient

My code below:

import torch.nn.functional as F
from torch.autograd import Variable
import torch

a = Variable(torch.ones(1, 4), requires_grad=True)
norm = F.normalize(a, p=2, dim=1)
y = norm.dot(norm)
y.backward()
print(a.grad)

And I got this result:

Variable containing:
 0  0  0  0
[torch.FloatTensor of size 1x4]

It seems the gradient of F.normalize() is zero. When I change the line norm = F.normalize(a, p=2, dim=1) to

qn = torch.norm(a, p=2, dim=1).detach()
norm = a.div(qn.expand_as(a))

everything works fine, can anyone explain this?
Thank you!

normalize, as its name suggests, normalizes a tensor (http://pytorch.org/docs/master/nn.html#torch.nn.functional.normalize). So when given a tensor containing all same values, it has 0 grad as it is symmetrical.

norm returns the norm of a tensor.

They do different things.

Thank you for your replay!
But I still cannot understand what you mean by “it has 0 grad as it is symmetrical”. As far as I know, Normalize is a differential operation, and we can compute its gradient by Chain Rule.
And I just read the source code of F.normalize(), which is quite similar with my second code except the “detach()” operation.

def normalize(input, p=2, dim=1, eps=1e-12):
    return input / input.norm(p, dim, True).clamp(min=eps).expand_as(input)

Can you explain more clearly? Thank you!

Yes, I know that the grad is defined. I was just saying that it will be 0. Symmetric is probably not the best word. What I meant is no matter what a is y is always 1, so a's grad is always 0.

Oh I see! Thank you!