I encountered a strange behaviour that I don’t reall understand. When I want to normalize a Variable I run into a zero division error during the backward pass when the standard deviation is zero.

For example:

a = Variable(torch.ones(1), requires_grad=True)
b = (a - a.mean())/(a.std() + 1e-4)
b.backward()
print(a.grad) # gives nan

I tracked down this problem and it even occurs if I just divide by the std with adding of some high epsilon

a = Variable(torch.ones(1), requires_grad=True)
b = 1/(a.std() + 1)
b.backward()
print(a.grad) # gives nan

But the epsilon should actually prevent the division by zero of the derivative right?
So what is happening here? And how can I normalize without running into this error?

The sqrt(x) is not defined for values smaller or equal zero (at least for real numbered values). Doing this operation with such values results in nan. Therefore the derivative does only exist for x > 0.